[Pcre-svn] [453] code/trunk: Add more explanation about recursive subpatterns, and make it possible to

Autor: Subversion repository
Data:
Para: pcre-svn
Assunto: [Pcre-svn] [453] code/trunk: Add more explanation about recursive subpatterns, and make it possible to

Revision: 453

          http://vcs.pcre.org/viewvc?view=rev&revision=453
Author:   ph10
Date:     2009-09-18 20:12:35 +0100 (Fri, 18 Sep 2009)

Log Message:
-----------
Add more explanation about recursive subpatterns, and make it possible to
process the documenation without building a whole release.

Modified Paths:
--------------
    code/trunk/PrepareRelease
    code/trunk/doc/html/pcre_dfa_exec.html
    code/trunk/doc/html/pcre_exec.html
    code/trunk/doc/html/pcreapi.html
    code/trunk/doc/html/pcrebuild.html
    code/trunk/doc/html/pcrecompat.html
    code/trunk/doc/html/pcredemo.html
    code/trunk/doc/html/pcregrep.html
    code/trunk/doc/html/pcrematching.html
    code/trunk/doc/html/pcrepartial.html
    code/trunk/doc/html/pcrepattern.html
    code/trunk/doc/html/pcreposix.html
    code/trunk/doc/html/pcretest.html
    code/trunk/doc/pcre.txt
    code/trunk/doc/pcrecompat.3
    code/trunk/doc/pcredemo.3
    code/trunk/doc/pcregrep.txt
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcretest.txt
    code/trunk/testdata/testinput11
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput11
    code/trunk/testdata/testoutput2

Modified: code/trunk/PrepareRelease
===================================================================
--- code/trunk/PrepareRelease    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/PrepareRelease    2009-09-18 19:12:35 UTC (rev 453)
@@ -4,8 +4,9 @@
 # processing of the documentation, detrails files, and creates pcre.h.generic
 # and config.h.generic (for use by builders who can't run ./configure).

-# You must run this script before runnning "make dist". It makes use of the
-# following files:
+# You must run this script before runnning "make dist". If its first argument
+# is "doc", it stops after preparing the documentation. There are no other
+# arguments. The script makes use of the following files:

 # 132html     A Perl script that converts a .1 or .3 man page into HTML. It
 #             "knows" the relevant troff constructs that are used in the PCRE
@@ -119,6 +120,7 @@
 # Exclude table of contents for function summaries. It seems that expr
 # forces an anchored regex. Also exclude them for small pages that have
 # only one section.
+
 for file in *.3 ; do
   base=`basename $file .3`
   toc=-toc
@@ -134,10 +136,11 @@
   if [ $? != 0 ] ; then exit 1; fi
 done

-# End of documentation processing
+# End of documentation processing; stop if only documentation required.

cd ..
echo Documentation done
+if [ "$1" = "doc" ] ; then exit; fi

# These files are detrailed; do not detrail the test data because there may be
# significant trailing spaces. The configure files are also omitted from the

Modified: code/trunk/doc/html/pcre_dfa_exec.html
===================================================================
--- code/trunk/doc/html/pcre_dfa_exec.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcre_dfa_exec.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -48,27 +48,29 @@
 </pre>
 The options are:
 <pre>
-  PCRE_ANCHORED      Match only at the first position
-  PCRE_BSR_ANYCRLF   \R matches only CR, LF, or CRLF
-  PCRE_BSR_UNICODE   \R matches all Unicode line endings
-  PCRE_NEWLINE_ANY   Recognize any Unicode newline sequence
-  PCRE_NEWLINE_ANYCRLF  Recognize CR, LF, and CRLF as newline sequences
-  PCRE_NEWLINE_CR    Set CR as the newline sequence
-  PCRE_NEWLINE_CRLF  Set CRLF as the newline sequence
-  PCRE_NEWLINE_LF    Set LF as the newline sequence
-  PCRE_NOTBOL        Subject is not the beginning of a line
-  PCRE_NOTEOL        Subject is not the end of a line
-  PCRE_NOTEMPTY      An empty string is not a valid match
-  PCRE_NO_START_OPTIMIZE  Do not do "start-match" optimizations
-  PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
-                       validity (only relevant if PCRE_UTF8
-                       was set at compile time)
-  PCRE_PARTIAL       ) Return PCRE_ERROR_PARTIAL for a partial match 
-  PCRE_PARTIAL_SOFT  )   if no full matches are found
-  PCRE_PARTIAL_HARD  Return PCRE_ERROR_PARTIAL for a partial match 
-                       even if there is a full match as well 
-  PCRE_DFA_SHORTEST  Return only the shortest match
-  PCRE_DFA_RESTART   This is a restart after a partial match
+  PCRE_ANCHORED          Match only at the first position
+  PCRE_BSR_ANYCRLF       \R matches only CR, LF, or CRLF
+  PCRE_BSR_UNICODE       \R matches all Unicode line endings
+  PCRE_NEWLINE_ANY       Recognize any Unicode newline sequence
+  PCRE_NEWLINE_ANYCRLF   Recognize CR, LF, & CRLF as newline sequences
+  PCRE_NEWLINE_CR        Recognize CR as the only newline sequence
+  PCRE_NEWLINE_CRLF      Recognize CRLF as the only newline sequence
+  PCRE_NEWLINE_LF        Recognize LF as the only newline sequence
+  PCRE_NOTBOL            Subject is not the beginning of a line
+  PCRE_NOTEOL            Subject is not the end of a line
+  PCRE_NOTEMPTY          An empty string is not a valid match
+  PCRE_NOTEMPTY_ATSTART  An empty string at the start of the subject
+                           is not a valid match
+  PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations
+  PCRE_NO_UTF8_CHECK     Do not check the subject for UTF-8
+                           validity (only relevant if PCRE_UTF8
+                           was set at compile time)
+  PCRE_PARTIAL           ) Return PCRE_ERROR_PARTIAL for a partial
+  PCRE_PARTIAL_SOFT      )   match if no full matches are found
+  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match 
+                           even if there is a full match as well 
+  PCRE_DFA_SHORTEST      Return only the shortest match
+  PCRE_DFA_RESTART       Restart after a partial match
 </pre>
 There are restrictions on what may appear in a pattern when using this matching
 function. Details are given in the

Modified: code/trunk/doc/html/pcre_exec.html
===================================================================
--- code/trunk/doc/html/pcre_exec.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcre_exec.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -44,25 +44,27 @@
 </pre>
 The options are:
 <pre>
-  PCRE_ANCHORED      Match only at the first position
-  PCRE_BSR_ANYCRLF   \R matches only CR, LF, or CRLF
-  PCRE_BSR_UNICODE   \R matches all Unicode line endings
-  PCRE_NEWLINE_ANY   Recognize any Unicode newline sequence
-  PCRE_NEWLINE_ANYCRLF  Recognize CR, LF, and CRLF as newline sequences
-  PCRE_NEWLINE_CR    Set CR as the newline sequence
-  PCRE_NEWLINE_CRLF  Set CRLF as the newline sequence
-  PCRE_NEWLINE_LF    Set LF as the newline sequence
-  PCRE_NOTBOL        Subject is not the beginning of a line
-  PCRE_NOTEOL        Subject is not the end of a line
-  PCRE_NOTEMPTY      An empty string is not a valid match
-  PCRE_NO_START_OPTIMIZE  Do not do "start-match" optimizations
-  PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
-                       validity (only relevant if PCRE_UTF8
-                       was set at compile time)
-  PCRE_PARTIAL       ) Return PCRE_ERROR_PARTIAL for a partial match 
-  PCRE_PARTIAL_SOFT  )   if no full matches are found
-  PCRE_PARTIAL_HARD  Return PCRE_ERROR_PARTIAL for a partial match 
-                       even if there is a full match as well 
+  PCRE_ANCHORED          Match only at the first position
+  PCRE_BSR_ANYCRLF       \R matches only CR, LF, or CRLF
+  PCRE_BSR_UNICODE       \R matches all Unicode line endings
+  PCRE_NEWLINE_ANY       Recognize any Unicode newline sequence
+  PCRE_NEWLINE_ANYCRLF   Recognize CR, LF, & CRLF as newline sequences
+  PCRE_NEWLINE_CR        Recognize CR as the only newline sequence
+  PCRE_NEWLINE_CRLF      Recognize CRLF as the only newline sequence
+  PCRE_NEWLINE_LF        Recognize LF as the only newline sequence
+  PCRE_NOTBOL            Subject string is not the beginning of a line
+  PCRE_NOTEOL            Subject string is not the end of a line
+  PCRE_NOTEMPTY          An empty string is not a valid match
+  PCRE_NOTEMPTY_ATSTART  An empty string at the start of the subject
+                           is not a valid match
+  PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations
+  PCRE_NO_UTF8_CHECK     Do not check the subject for UTF-8
+                           validity (only relevant if PCRE_UTF8
+                           was set at compile time)
+  PCRE_PARTIAL           ) Return PCRE_ERROR_PARTIAL for a partial
+  PCRE_PARTIAL_SOFT      )   match if no full matches are found
+  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match 
+                           even if there is a full match as well 
 </pre>
 For details of partial matching, see the
 <a href="pcrepartial.html"><b>pcrepartial</b></a>

Modified: code/trunk/doc/html/pcreapi.html
===================================================================
--- code/trunk/doc/html/pcreapi.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcreapi.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -175,9 +175,10 @@
 A second matching function, <b>pcre_dfa_exec()</b>, which is not
 Perl-compatible, is also provided. This uses a different algorithm for the
 matching. The alternative algorithm finds all possible matches (at a given
-point in the subject), and scans the subject just once. However, this algorithm
-does not return captured substrings. A description of the two matching
-algorithms and their advantages and disadvantages is given in the
+point in the subject), and scans the subject just once (unless there are
+lookbehind assertions). However, this algorithm does not return captured
+substrings. A description of the two matching algorithms and their advantages
+and disadvantages is given in the
 <a href="pcrematching.html"><b>pcrematching</b></a>
 documentation.
 </P>
@@ -1017,10 +1018,10 @@
 <pre>
   PCRE_INFO_OKPARTIAL
 </pre>
-Return 1 if the pattern can be used for partial matching, otherwise 0. The
-fourth argument should point to an <b>int</b> variable. From release 8.00, this
-always returns 1, because the restrictions that previously applied to partial
-matching have been lifted. The
+Return 1 if the pattern can be used for partial matching with
+<b>pcre_exec()</b>, otherwise 0. The fourth argument should point to an
+<b>int</b> variable. From release 8.00, this always returns 1, because the
+restrictions that previously applied to partial matching have been lifted. The
 <a href="pcrepartial.html"><b>pcrepartial</b></a>
 documentation gives details of partial matching.
 <pre>
@@ -1224,8 +1225,8 @@
 is exceeded, <b>pcre_exec()</b> returns PCRE_ERROR_RECURSIONLIMIT.
 </P>
 <P>
-The <i>pcre_callout</i> field is used in conjunction with the "callout" feature,
-which is described in the
+The <i>callout_data</i> field is used in conjunction with the "callout" feature,
+and is described in the
 <a href="pcrecallout.html"><b>pcrecallout</b></a>
 documentation.
 </P>
@@ -1248,8 +1249,9 @@
 <P>
 The unused bits of the <i>options</i> argument for <b>pcre_exec()</b> must be
 zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_<i>xxx</i>,
-PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_START_OPTIMIZE,
-PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and PCRE_PARTIAL_HARD.
+PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
+PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and
+PCRE_PARTIAL_HARD.
 <pre>
   PCRE_ANCHORED
 </pre>
@@ -1328,18 +1330,25 @@
 <pre>
   a?b?
 </pre>
-is applied to a string not beginning with "a" or "b", it matches the empty
+is applied to a string not beginning with "a" or "b", it matches an empty
 string at the start of the subject. With PCRE_NOTEMPTY set, this match is not
 valid, so PCRE searches further into the string for occurrences of "a" or "b".
+<pre>
+  PCRE_NOTEMPTY_ATSTART
+</pre>
+This is like PCRE_NOTEMPTY, except that an empty string match that is not at 
+the start of the subject is permitted. If the pattern is anchored, such a match
+can occur only if the pattern contains \K.
 </P>
 <P>
-Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a special case
-of a pattern match of the empty string within its <b>split()</b> function, and
-when using the /g modifier. It is possible to emulate Perl's behaviour after
-matching a null string by first trying the match again at the same offset with
-PCRE_NOTEMPTY and PCRE_ANCHORED, and then if that fails by advancing the
-starting offset (see below) and trying an ordinary match again. There is some
-code that demonstrates how to do this in the 
+Perl has no direct equivalent of PCRE_NOTEMPTY or PCRE_NOTEMPTY_ATSTART, but it
+does make a special case of a pattern match of the empty string within its
+<b>split()</b> function, and when using the /g modifier. It is possible to
+emulate Perl's behaviour after matching a null string by first trying the match
+again at the same offset with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then
+if that fails, by advancing the starting offset (see below) and trying an
+ordinary match again. There is some code that demonstrates how to do this in
+the
 <a href="pcredemo.html"><b>pcredemo</b></a>
 sample program.
 <pre>
@@ -1389,8 +1398,8 @@
 PCRE_ERROR_PARTIAL. Otherwise, if PCRE_PARTIAL_SOFT is set, matching continues
 by testing any other alternatives. Only if they all fail is PCRE_ERROR_PARTIAL
 returned (instead of PCRE_ERROR_NOMATCH). The portion of the string that
-provided the partial match is set as the first matching string. There is a more
-detailed discussion in the
+was inspected when the partial match was found is set as the first matching
+string. There is a more detailed discussion in the
 <a href="pcrepartial.html"><b>pcrepartial</b></a>
 documentation.
 </P>
@@ -1837,8 +1846,8 @@
 just once, and does not backtrack. This has different characteristics to the
 normal algorithm, and is not compatible with Perl. Some of the features of PCRE
 patterns are not supported. Nevertheless, there are times when this kind of
-matching can be useful. For a discussion of the two matching algorithms, see
-the
+matching can be useful. For a discussion of the two matching algorithms, and a 
+list of features that <b>pcre_dfa_exec()</b> does not support, see the
 <a href="pcrematching.html"><b>pcrematching</b></a>
 documentation.
 </P>
@@ -1880,10 +1889,10 @@
 <P>
 The unused bits of the <i>options</i> argument for <b>pcre_dfa_exec()</b> must be
 zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_<i>xxx</i>,
-PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD,
-PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
-four of these are exactly the same as for <b>pcre_exec()</b>, so their
-description is not repeated here.
+PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
+PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST,
+and PCRE_DFA_RESTART. All but the last four of these are exactly the same as
+for <b>pcre_exec()</b>, so their description is not repeated here.
 <pre>
   PCRE_PARTIAL_HARD
   PCRE_PARTIAL_SOFT 
@@ -1896,8 +1905,8 @@
 been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH
 is converted into PCRE_ERROR_PARTIAL if the end of the subject is reached,
 there have been no complete matches, but there is still at least one matching
-possibility. The portion of the string that provided the longest partial match
-is set as the first matching string in both cases.
+possibility. The portion of the string that was inspected when the longest
+partial match was found is set as the first matching string in both cases.
 <pre>
   PCRE_DFA_SHORTEST
 </pre>
@@ -2009,7 +2018,7 @@
 </P>
 <br><a name="SEC22" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 01 September 2009
+Last updated: 11 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcrebuild.html
===================================================================
--- code/trunk/doc/html/pcrebuild.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcrebuild.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -39,10 +39,16 @@
 the optional features are selected or deselected by providing options to
 <b>configure</b> before running the <b>make</b> command. However, the same
 options can be selected in both Unix-like and non-Unix-like environments using
-the GUI facility of <b>CMakeSetup</b> if you are using <b>CMake</b> instead of
-<b>configure</b> to build PCRE.
+the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead of
+<b>configure</b> to build PCRE. 
 </P>
 <P>
+There is a lot more information about building PCRE in non-Unix-like 
+environments in the file called <i>NON_UNIX_USE</i>, which is part of the PCRE 
+distribution. You should consult this file as well as the <i>README</i> file if 
+you are building in a non-Unix-like environment.
+</P>
+<P>
 The complete list of options for <b>configure</b> (which includes the standard
 ones such as the selection of the installation directory) can be obtained by
 running
@@ -339,7 +345,7 @@
 </P>
 <br><a name="SEC18" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 17 March 2009
+Last updated: 06 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcrecompat.html
===================================================================
--- code/trunk/doc/html/pcrecompat.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcrecompat.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -59,7 +59,10 @@
 built with Unicode character property support. The properties that can be
 tested with \p and \P are limited to the general category properties such as
 Lu and Nd, script names such as Greek or Han, and the derived properties Any
-and L&.
+and L&. PCRE does support the Cs (surrogate) property, which Perl does not; the
+Perl documentation says "Because Perl hides the need for the user to understand
+the internal representation of Unicode characters, there is no need to
+implement the somewhat messy concept of surrogates."
 </P>
 <P>
 7. PCRE does support the \Q...\E escape for quoting substrings. Characters in
@@ -79,7 +82,7 @@
 <P>
 8. Fairly obviously, PCRE does not support the (?{code}) and (??{code})
 constructions. However, there is support for recursive patterns. This is not
-available in Perl 5.8, but will be in Perl 5.10. Also, the PCRE "callout"
+available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout"
 feature allows an external function to be called during pattern matching. See
 the
 <a href="pcrecallout.html"><b>pcrecallout</b></a>
@@ -87,7 +90,12 @@
 </P>
 <P>
 9. Subpatterns that are called recursively or as "subroutines" are always
-treated as atomic groups in PCRE. This is like Python, but unlike Perl.
+treated as atomic groups in PCRE. This is like Python, but unlike Perl. There 
+is a discussion of an example that explains this in more detail in the
+<a href="pcrepattern.html#recursiondifference">section on recursion differences from Perl</a>
+in the
+<a href="pcrecompat.html"><b>pcrecompat</b></a>
+page.
 </P>
 <P>
 10. There are some differences that are concerned with the settings of captured
@@ -97,8 +105,7 @@
 <P>
 11. PCRE does support Perl 5.10's backtracking verbs (*ACCEPT), (*FAIL), (*F),
 (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in the forms without an
-argument. PCRE does not support (*MARK). If (*ACCEPT) is within capturing
-parentheses, PCRE does not set that capture group; this is different to Perl.
+argument. PCRE does not support (*MARK).
 </P>
 <P>
 12. PCRE provides some extensions to the Perl regular expression facilities.
@@ -130,8 +137,8 @@
 only at the first matching position in the subject string.
 <br>
 <br>
-(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
-options for <b>pcre_exec()</b> have no Perl equivalents.
+(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, and
+PCRE_NO_AUTO_CAPTURE options for <b>pcre_exec()</b> have no Perl equivalents.
 <br>
 <br>
 (g) The \R escape sequence can be restricted to match only CR, LF, or CRLF
@@ -170,7 +177,7 @@
 REVISION
 </b><br>
 <P>
-Last updated: 25 August 2009
+Last updated: 18 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcredemo.html
===================================================================
--- code/trunk/doc/html/pcredemo.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcredemo.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -240,12 +240,12 @@
 *                                                                        *
 * If the previous match WAS for an empty string, we can't do that, as it *
 * would lead to an infinite loop. Instead, a special call of pcre_exec() *
-* is made with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set. The first  *
-* of these tells PCRE that an empty string is not a valid match; other   *
-* possibilities must be tried. The second flag restricts PCRE to one     *
-* match attempt at the initial string position. If this match succeeds,  *
-* an alternative to the empty string match has been found, and we can    *
-* proceed round the loop.                                                *
+* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set.    *
+* The first of these tells PCRE that an empty string at the start of the *
+* subject is not a valid match; other possibilities must be tried. The   *
+* second flag restricts PCRE to one match attempt at the initial string  *
+* position. If this match succeeds, an alternative to the empty string   *
+* match has been found, and we can proceed round the loop.               *
 *************************************************************************/

 if (!find_all)
@@ -268,7 +268,7 @@
   if (ovector[0] == ovector[1])
     {
     if (ovector[0] == subject_length) break;
-    options = PCRE_NOTEMPTY | PCRE_ANCHORED;
+    options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
     }

/* Run the next matching operation */

Modified: code/trunk/doc/html/pcregrep.html
===================================================================
--- code/trunk/doc/html/pcregrep.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcregrep.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -98,7 +98,7 @@
 </P>
 <P>
 Patterns that can match an empty string are accepted, but empty string
-matches are not recognized. An example is the pattern "(super)?(man)?", in
+matches are never recognized. An example is the pattern "(super)?(man)?", in
 which all components are optional. This pattern finds all occurrences of both
 "super" and "man"; the output differs from matching with "super|man" when only
 the matching substrings are being shown.
@@ -538,7 +538,7 @@
 </P>
 <br><a name="SEC13" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 12 August 2009
+Last updated: 13 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcrematching.html
===================================================================
--- code/trunk/doc/html/pcrematching.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcrematching.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -116,6 +116,12 @@
 matches that start at later positions.
 </P>
 <P>
+Although the general principle of this matching algorithm is that it scans the 
+subject string only once, without backtracking, there is one exception: when a 
+lookbehind assertion is encountered, the preceding characters have to be
+re-inspected.
+</P>
+<P>
 There are a number of features of PCRE regular expressions that are not
 supported by the alternative matching algorithm. They are as follows:
 </P>
@@ -209,7 +215,7 @@
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 25 August 2009
+Last updated: 05 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcrepartial.html
===================================================================
--- code/trunk/doc/html/pcrepartial.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcrepartial.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -51,10 +51,10 @@
 <P>
 PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and
 PCRE_PARTIAL_HARD options, which can be set when calling <b>pcre_exec()</b> or
-<b>pcre_dfa_exec()</b>. For backwards compatibility, PCRE_PARTIAL is a synonym 
-for PCRE_PARTIAL_SOFT. The essential difference between the two options is 
-whether or not a partial match is preferred to an alternative complete match, 
-though the details differ between the two matching functions. If both options 
+<b>pcre_dfa_exec()</b>. For backwards compatibility, PCRE_PARTIAL is a synonym
+for PCRE_PARTIAL_SOFT. The essential difference between the two options is
+whether or not a partial match is preferred to an alternative complete match,
+though the details differ between the two matching functions. If both options
 are set, PCRE_PARTIAL_HARD takes precedence.
 </P>
 <P>
@@ -74,50 +74,69 @@
 If PCRE_PARTIAL_SOFT is set, the partial match is remembered, but matching
 continues as normal, and other alternatives in the pattern are tried. If no
 complete match can be found, <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL
-instead of PCRE_ERROR_NOMATCH, and if there are at least two slots in the
-offsets vector, they are filled in with the offsets of the longest string that
-partially matched. Consider this pattern:
+instead of PCRE_ERROR_NOMATCH. If there are at least two slots in the offsets
+vector, the first of them is set to the offset of the earliest character that
+was inspected when the partial match was found. For convenience, the second
+offset points to the end of the string so that a substring can easily be
+extracted.
+</P>
+<P>
+For the majority of patterns, the first offset identifies the start of the
+partially matched string. However, for patterns that contain lookbehind
+assertions, or \K, or begin with \b or \B, earlier characters have been
+inspected while carrying out the match. For example:
 <pre>
+  /(?&#60;=abc)123/
+</pre>
+This pattern matches "123", but only if it is preceded by "abc". If the subject
+string is "xyzabc12", the offsets after a partial match are for the substring
+"abc12", because all these characters are needed if another match is tried
+with extra characters added.
+</P>
+<P>
+If there is more than one partial match, the first one that was found provides
+the data that is returned. Consider this pattern:
+<pre>
   /123\w+X|dogY/
 </pre>
 If this is matched against the subject string "abc123dog", both
-alternatives fail to match, but the end of the subject is reached during 
+alternatives fail to match, but the end of the subject is reached during
 matching, so PCRE_ERROR_PARTIAL is returned instead of PCRE_ERROR_NOMATCH. The
-offsets are set to 3 and 9, identifying "123dog" as the longest partial match
+offsets are set to 3 and 9, identifying "123dog" as the first partial match
 that was found. (In this example, there are two partial matches, because "dog"
 on its own partially matches the second alternative.)
 </P>
 <P>
-If PCRE_PARTIAL_HARD is set for <b>pcre_exec()</b>, it returns 
+If PCRE_PARTIAL_HARD is set for <b>pcre_exec()</b>, it returns
 PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to
 search for possible complete matches. The difference between the two options
 can be illustrated by a pattern such as:
 <pre>
   /dog(sbody)?/
 </pre>
-This matches either "dog" or "dogsbody", greedily (that is, it prefers the 
+This matches either "dog" or "dogsbody", greedily (that is, it prefers the
 longer string if possible). If it is matched against the string "dog" with
-PCRE_PARTIAL_SOFT, it yields a complete match for "dog". However, if 
-PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL. On the other hand, 
+PCRE_PARTIAL_SOFT, it yields a complete match for "dog". However, if
+PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL. On the other hand,
 if the pattern is made ungreedy the result is different:
 <pre>
   /dog(sbody)??/
 </pre>
-In this case the result is always a complete match because <b>pcre_exec()</b> 
-finds that first, and it never continues after finding a match. It might be 
+In this case the result is always a complete match because <b>pcre_exec()</b>
+finds that first, and it never continues after finding a match. It might be
 easier to follow this explanation by thinking of the two patterns like this:
 <pre>
   /dog(sbody)?/    is the same as  /dogsbody|dog/
   /dog(sbody)??/   is the same as  /dog|dogsbody/
 </pre>
-The second pattern will never match "dogsbody" when <b>pcre_exec()</b> is 
+The second pattern will never match "dogsbody" when <b>pcre_exec()</b> is
 used, because it will always find the shorter match first.
 </P>
 <br><a name="SEC3" href="#TOC1">PARTIAL MATCHING USING pcre_dfa_exec()</a><br>
 <P>
-The <b>pcre_dfa_exec()</b> function moves along the subject string character by 
-character, without backtracking, searching for all possible matches 
-simultaneously. If the end of the subject is reached before the end of the 
+The <b>pcre_dfa_exec()</b> function moves along the subject string character by
+character, without backtracking, searching for all possible matches
+simultaneously. If the end of the subject is reached before the end of the
 pattern, there is the possibility of a partial match, again provided that at
 least one character has matched.
 </P>
@@ -125,40 +144,40 @@
 When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned only if there
 have been no complete matches. Otherwise, the complete matches are returned.
 However, if PCRE_PARTIAL_HARD is set, a partial match takes precedence over any
-complete matches. The portion of the string that provided the longest partial
-match is set as the first matching string, provided there are at least two
-slots in the offsets vector.
+complete matches. The portion of the string that was inspected when the longest
+partial match was found is set as the first matching string, provided there are
+at least two slots in the offsets vector.
 </P>
 <P>
-Because <b>pcre_dfa_exec()</b> always searches for all possible matches, and 
+Because <b>pcre_dfa_exec()</b> always searches for all possible matches, and
 there is no difference between greedy and ungreedy repetition, its behaviour is
-different from <b>pcre_exec</b> when PCRE_PARTIAL_HARD is set. Consider the 
+different from <b>pcre_exec</b> when PCRE_PARTIAL_HARD is set. Consider the
 string "dog" matched against the ungreedy pattern shown above:
 <pre>
   /dog(sbody)??/
 </pre>
-Whereas <b>pcre_exec()</b> stops as soon as it finds the complete match for 
+Whereas <b>pcre_exec()</b> stops as soon as it finds the complete match for
 "dog", <b>pcre_dfa_exec()</b> also finds the partial match for "dogsbody", and
 so returns that when PCRE_PARTIAL_HARD is set.
 </P>
 <br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br>
 <P>
-If a pattern ends with one of sequences \w or \W, which test for word 
-boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive 
+If a pattern ends with one of sequences \w or \W, which test for word
+boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive
 results. Consider this pattern:
 <pre>
   /\bcat\b/
 </pre>
 This matches "cat", provided there is a word boundary at either end. If the
 subject string is "the cat", the comparison of the final "t" with a following
-character cannot take place, so a partial match is found. However, 
-<b>pcre_exec()</b> carries on with normal matching, which matches \b at the end 
-of the subject when the last character is a letter, thus finding a complete 
-match. The result, therefore, is <i>not</i> PCRE_ERROR_PARTIAL. The same thing 
+character cannot take place, so a partial match is found. However,
+<b>pcre_exec()</b> carries on with normal matching, which matches \b at the end
+of the subject when the last character is a letter, thus finding a complete
+match. The result, therefore, is <i>not</i> PCRE_ERROR_PARTIAL. The same thing
 happens with <b>pcre_dfa_exec()</b>, because it also finds the complete match.
 </P>
 <P>
-Using PCRE_PARTIAL_HARD in this case does yield PCRE_ERROR_PARTIAL, because 
+Using PCRE_PARTIAL_HARD in this case does yield PCRE_ERROR_PARTIAL, because
 then the partial match takes precedence.
 </P>
 <br><a name="SEC5" href="#TOC1">FORMERLY RESTRICTED PATTERNS</a><br>
@@ -236,10 +255,10 @@
 </P>
 <br><a name="SEC8" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_exec()</a><br>
 <P>
-From release 8.00, <b>pcre_exec()</b> can also be used to do multi-segment 
-matching. Unlike <b>pcre_dfa_exec()</b>, it is not possible to restart the 
-previous match with a new segment of data. Instead, new data must be added to 
-the previous subject string, and the entire match re-run, starting from the 
+From release 8.00, <b>pcre_exec()</b> can also be used to do multi-segment
+matching. Unlike <b>pcre_dfa_exec()</b>, it is not possible to restart the
+previous match with a new segment of data. Instead, new data must be added to
+the previous subject string, and the entire match re-run, starting from the
 point where the partial match occurred. Earlier data can be discarded.
 Consider an unanchored pattern that matches dates:
 <pre>
@@ -247,15 +266,21 @@
   data&#62; The date is 23ja\P
   Partial match: 23ja
 </pre>
-The this stage, an application could discard the text preceding "23ja", add on 
-text from the next segment, and call <b>pcre_exec()</b> again. Unlike 
-<b>pcre_dfa_exec()</b>, the entire matching string must always be available, and 
-the complete matching process occurs for each call, so more memory and more 
+The this stage, an application could discard the text preceding "23ja", add on
+text from the next segment, and call <b>pcre_exec()</b> again. Unlike
+<b>pcre_dfa_exec()</b>, the entire matching string must always be available, and
+the complete matching process occurs for each call, so more memory and more
 processing time is needed.
 </P>
+<P>
+<b>Note:</b> If the pattern contains lookbehind assertions, or \K, or starts
+with \b or \B, the string that is returned for a partial match will include
+characters that precede the partially matched string itself, because these must
+be retained when adding on more characters for a subsequent matching attempt.
+</P>
 <br><a name="SEC9" href="#TOC1">ISSUES WITH MULTI-SEGMENT MATCHING</a><br>
 <P>
-Certain types of pattern may give problems with multi-segment matching, 
+Certain types of pattern may give problems with multi-segment matching,
 whichever matching function is used.
 </P>
 <P>
@@ -264,18 +289,18 @@
 subject string for any call does not contain the beginning or end of a line.
 </P>
 <P>
-2. If the pattern contains backward assertions (including \b or \B), you need
-to arrange for some overlap in the subject strings to allow for them to be
-correctly tested at the start of each substring. For example, using
-<b>pcre_dfa_exec()</b>, you could pass the subject in chunks that are 500 bytes
-long, but in a buffer of 700 bytes, with the starting offset set to 200 and the
-previous 200 bytes at the start of the buffer.
+2. Lookbehind assertions at the start of a pattern are catered for in the
+offsets that are returned for a partial match. However, in theory, a lookbehind
+assertion later in the pattern could require even earlier characters to be
+inspected, and it might not have been reached when a partial match occurs. This
+is probably an extremely unlikely case; you could guard against it to a certain
+extent by always including extra characters at the start.
 </P>
 <P>
 3. Matching a subject string that is split into multiple segments may not
 always produce exactly the same result as matching over one single long string,
-especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and 
-Word Boundaries" above describes an issue that arises if the pattern ends with 
+especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and
+Word Boundaries" above describes an issue that arises if the pattern ends with
 \b or \B. Another kind of difference may occur when there are multiple
 matching possibilities, because a partial match result is given only when there
 are no completed matches. This means that as soon as the shortest match has
@@ -284,7 +309,7 @@
 <pre>
     re&#62; /dog(sbody)?/
   data&#62; dogsb\P
-   0: dog    
+   0: dog
   data&#62; do\P\D
   Partial match: do
   data&#62; gsb\R\P\D
@@ -308,17 +333,17 @@
 <pre>
     re&#62; /dog(sbody)?/
   data&#62; dogsb\P\P
-  Partial match: dogsb 
+  Partial match: dogsb
   data&#62; do\P\D
   Partial match: do
   data&#62; gsb\R\P\P\D
-  Partial match: gsb    
+  Partial match: gsb

 </PRE>
 </P>
 <P>
 4. Patterns that contain alternatives at the top level which do not all
-start with the same pattern item may not work as expected when 
+start with the same pattern item may not work as expected when
 <b>pcre_dfa_exec()</b> is used. For example, consider this pattern:
 <pre>
   1234|3789
@@ -335,7 +360,7 @@
   1234|ABCD
 </pre>
 where no string can be a partial match for both alternatives. This is not a
-problem if \fPpcre_exec()\fP is used, because the entire match has to be rerun 
+problem if \fPpcre_exec()\fP is used, because the entire match has to be rerun
 each time:
 <pre>
     re&#62; /1234|3789/
@@ -357,7 +382,7 @@
 </P>
 <br><a name="SEC11" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 31 August 2009
+Last updated: 05 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcrepattern.html
===================================================================
--- code/trunk/doc/html/pcrepattern.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcrepattern.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -644,10 +644,10 @@
 cannot be tested by PCRE, unless UTF-8 validity checking has been turned off
 (see the discussion of PCRE_NO_UTF8_CHECK in the
 <a href="pcreapi.html"><b>pcreapi</b></a>
-page).
+page). Perl does not support the Cs property.
 </P>
 <P>
-The long synonyms for these properties that Perl supports (such as \p{Letter})
+The long synonyms for property names that Perl supports (such as \p{Letter})
 are not supported by PCRE, nor is it permitted to prefix any of these
 properties with "Is".
 </P>
@@ -1922,7 +1922,7 @@
 Obviously, PCRE cannot support the interpolation of Perl code. Instead, it
 supports special syntax for recursion of the entire pattern, and also for
 individual subpattern recursion. After its introduction in PCRE and Python,
-this kind of recursion was introduced into Perl at release 5.10.
+this kind of recursion was subsequently introduced into Perl at release 5.10.
 </P>
 <P>
 A special item that consists of (? followed by a number greater than zero and a
@@ -1932,12 +1932,6 @@
 a recursive call of the entire regular expression.
 </P>
 <P>
-In PCRE (like Python, but unlike Perl), a recursive subpattern call is always
-treated as an atomic group. That is, once it has matched some of the subject
-string, it is never re-entered, even if it contains untried alternatives and
-there is a subsequent matching failure.
-</P>
-<P>
 This PCRE pattern solves the nested parentheses problem (assume the
 PCRE_EXTENDED option is set so that white space is ignored):
 <pre>
@@ -2028,6 +2022,72 @@
 In this pattern, (?(R) is the start of a conditional subpattern, with two
 different alternatives for the recursive and non-recursive cases. The (?R) item
 is the actual recursive call.
+<a name="recursiondifference"></a></P>
+<br><b>
+Recursion difference from Perl
+</b><br>
+<P>
+In PCRE (like Python, but unlike Perl), a recursive subpattern call is always
+treated as an atomic group. That is, once it has matched some of the subject
+string, it is never re-entered, even if it contains untried alternatives and
+there is a subsequent matching failure. This can be illustrated by the 
+following pattern, which purports to match a palindromic string that contains 
+an odd number of characters (for example, "a", "aba", "abcba", "abcdcba"):
+<pre>
+  ^(.|(.)(?1)\2)$
+</pre>
+The idea is that it either matches a single character, or two identical 
+characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE 
+it does not if the pattern is longer than three characters. Consider the
+subject string "abcba":
+</P>
+<P>
+At the top level, the first character is matched, but as it is not at the end 
+of the string, the first alternative fails; the second alternative is taken
+and the recursion kicks in. The recursive call to subpattern 1 successfully
+matches the next character ("b"). (Note that the beginning and end of line
+tests are not part of the recursion).
+</P>
+<P>
+Back at the top level, the next character ("c") is compared with what
+subpattern 2 matched, which was "a". This fails. Because the recursion is 
+treated as an atomic group, there are now no backtracking points, and so the
+entire match fails. (Perl is able, at this point, to re-enter the recursion and
+try the second alternative.) However, if the pattern is written with the
+alternatives in the other order, things are different:
+<pre>
+  ^((.)(?1)\2|.)$
+</pre>
+This time, the recursing alternative is tried first, and continues to recurse 
+until it runs out of characters, at which point the recursion fails. But this 
+time we do have another alternative to try at the higher level. That is the big 
+difference: in the previous case the remaining alternative is at a deeper
+recursion level, which PCRE cannot use.
+</P>
+<P>
+To change the pattern so that matches all palindromic strings, not just those 
+with an odd number of characters, it is tempting to change the pattern to this:
+<pre>
+  ^((.)(?1)\2|.?)$
+</pre>
+Again, this works in Perl, but not in PCRE, and for the same reason. When a 
+deeper recursion has matched a single character, it cannot be entered again in 
+order to match an empty string. The solution is to separate the two cases, and 
+write out the odd and even cases as alternatives at the higher level:
+<pre>
+  ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
+</pre>
+If you want to match typical palindromic phrases, the pattern has to ignore all 
+non-word characters, which can be done like this:
+<pre>
+  ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
+</pre>
+If run with the PCRE_CASELESS option, this pattern matches phrases such as "A 
+man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note 
+the use of the possessive quantifier *+ to avoid backtracking into sequences of 
+non-word characters. Without this, PCRE takes a great deal longer (ten times or
+more) to match typical phrases, and Perl takes so long that you think it has
+gone into a loop.
 <a name="subpatternsassubroutines"></a></P>
 <br><a name="SEC22" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
 <P>
@@ -2138,6 +2198,12 @@
 <b>pcre_dfa_exec()</b>.
 </P>
 <P>
+If any of these verbs are used in an assertion subpattern, their effect is 
+confined to that subpattern; it does not extend to the surrounding pattern.
+Note that assertion subpatterns are processed as anchored at the point where 
+they are tested.
+</P>
+<P>
 The new verbs make use of what was previously invalid syntax: an opening
 parenthesis followed by an asterisk. In Perl, they are generally of the form
 (*VERB:ARG) but PCRE does not support the use of arguments, so its general
@@ -2154,14 +2220,13 @@
 </pre>
 This verb causes the match to end successfully, skipping the remainder of the
 pattern. When inside a recursion, only the innermost pattern is ended
-immediately. PCRE differs from Perl in what happens if the (*ACCEPT) is inside
-capturing parentheses. In Perl, the data so far is captured: in PCRE no data is
-captured. For example:
+immediately. If the (*ACCEPT) is inside capturing parentheses, the data so far
+is captured. (This feature was added to PCRE at release 8.00.) For example:
 <pre>
-  A(A|B(*ACCEPT)|C)D
+  A((?:A|B(*ACCEPT)|C)D)
 </pre>
-This matches "AB", "AAD", or "ACD", but when it matches "AB", no data is
-captured.
+This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by 
+the outer parentheses.
 <pre>
   (*FAIL) or (*F)
 </pre>
@@ -2253,7 +2318,7 @@
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 April 2009
+Last updated: 18 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcreposix.html
===================================================================
--- code/trunk/doc/html/pcreposix.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcreposix.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -66,6 +66,11 @@
 replacement library. Other POSIX options are not even defined.
 </P>
 <P>
+There are also some other options that are not defined by POSIX. These have
+been added at the request of users who want to make use of certain
+PCRE-specific features via the POSIX calling interface.
+</P>
+<P>
 When PCRE is called via these functions, it is only the API that is POSIX-like
 in style. The syntax and semantics of the regular expressions themselves are
 still those of Perl, subject to the setting of various PCRE options, as
@@ -121,6 +126,12 @@
 <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
 are returned.
 <pre>
+  REG_UNGREEDY
+</pre>
+The PCRE_UNGREEDY option is set when the regular expression is passed for 
+compilation to the native function. Note that REG_UNGREEDY is not part of the
+POSIX standard.   
+<pre>
   REG_UTF8
 </pre>
 The PCRE_UTF8 option is set when the regular expression is passed for
@@ -134,7 +145,7 @@
 particular, the way it handles newline characters in the subject string is the
 Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
 <i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
-newlines are matched by . (they aren't) or by a negative class such as [^a]
+newlines are matched by . (they are not) or by a negative class such as [^a]
 (they are).
 </P>
 <P>
@@ -222,6 +233,10 @@
 <b>regexec()</b> are ignored.
 </P>
 <P>
+If the value of <i>nmatch</i> is zero, or if the value <i>pmatch</i> is NULL,
+no data about any matched strings is returned.
+</P>
+<P>
 Otherwise,the portion of the string that was matched, and also any captured
 substrings, are returned via the <i>pmatch</i> argument, which points to an
 array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
@@ -262,7 +277,7 @@
 </P>
 <br><a name="SEC9" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 15 August 2009
+Last updated: 02 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcretest.html
===================================================================
--- code/trunk/doc/html/pcretest.html    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/html/pcretest.html    2009-09-18 19:12:35 UTC (rev 453)
@@ -246,11 +246,11 @@
 </P>
 <P>
 If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an
-empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
-flags set in order to search for another, non-empty, match at the same point.
-If this second match fails, the start offset is advanced by one, and the normal
-match is retried. This imitates the way Perl handles such cases when using the
-<b>/g</b> modifier or the <b>split()</b> function.
+empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and
+PCRE_ANCHORED flags set in order to search for another, non-empty, match at the
+same point. If this second match fails, the start offset is advanced by one 
+character, and the normal match is retried. This imitates the way Perl handles
+such cases when using the <b>/g</b> modifier or the <b>split()</b> function.
 </P>
 <br><b>
 Other modifiers
@@ -370,7 +370,8 @@
                ated by next non-alphanumeric character)
   \L         call pcre_get_substringlist() after a successful match
   \M         discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
-  \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>; if used twice, pass the
+               PCRE_NOTEMPTY_ATSTART option 
   \Odd       set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
   \P         pass the PCRE_PARTIAL_SOFT option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>; if used twice, pass the
                PCRE_PARTIAL_HARD option 
@@ -454,10 +455,11 @@
 <P>
 When a match succeeds, pcretest outputs the list of captured substrings that
 <b>pcre_exec()</b> returns, starting with number 0 for the string that matched
-the whole pattern. Otherwise, it outputs "No match" or "Partial match:"
-followed by the partially matching substring when <b>pcre_exec()</b> returns
-PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL, respectively, and otherwise the PCRE
-negative error number. Here is an example of an interactive <b>pcretest</b> run.
+the whole pattern. Otherwise, it outputs "No match" when the return is
+PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching
+substring when <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL. For any other
+returns, it outputs the PCRE negative error number. Here is an example of an
+interactive <b>pcretest</b> run.
 <pre>
   $ pcretest
   PCRE version 7.0 30-Nov-2006
@@ -706,7 +708,7 @@
 </P>
 <br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 29 August 2009
+Last updated: 11 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>

Modified: code/trunk/doc/pcre.txt
===================================================================
--- code/trunk/doc/pcre.txt    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/pcre.txt    2009-09-18 19:12:35 UTC (rev 453)
@@ -282,20 +282,25 @@
        script,  where the optional features are selected or deselected by pro-
        viding options to configure before running the make  command.  However,
        the  same  options  can be selected in both Unix-like and non-Unix-like
-       environments using the GUI facility of  CMakeSetup  if  you  are  using
-       CMake instead of configure to build PCRE.
+       environments using the GUI facility of cmake-gui if you are using CMake
+       instead of configure to build PCRE.

+       There  is  a  lot more information about building PCRE in non-Unix-like
+       environments in the file called NON_UNIX_USE, which is part of the PCRE
+       distribution.  You  should consult this file as well as the README file
+       if you are building in a non-Unix-like environment.
+
        The complete list of options for configure (which includes the standard
-       ones such as the  selection  of  the  installation  directory)  can  be
+       ones  such  as  the  selection  of  the  installation directory) can be
        obtained by running

          ./configure --help

-       The  following  sections  include  descriptions  of options whose names
+       The following sections include  descriptions  of  options  whose  names
        begin with --enable or --disable. These settings specify changes to the
-       defaults  for  the configure command. Because of the way that configure
-       works, --enable and --disable always come in pairs, so  the  complemen-
-       tary  option always exists as well, but as it specifies the default, it
+       defaults for the configure command. Because of the way  that  configure
+       works,  --enable  and --disable always come in pairs, so the complemen-
+       tary option always exists as well, but as it specifies the default,  it
        is not described.

@@ -316,46 +321,46 @@

          --enable-utf8

-       to the configure command. Of itself, this  does  not  make  PCRE  treat
-       strings  as UTF-8. As well as compiling PCRE with this option, you also
-       have have to set the PCRE_UTF8 option when you call the  pcre_compile()
+       to  the  configure  command.  Of  itself, this does not make PCRE treat
+       strings as UTF-8. As well as compiling PCRE with this option, you  also
+       have  have to set the PCRE_UTF8 option when you call the pcre_compile()
        function.

-       If  you set --enable-utf8 when compiling in an EBCDIC environment, PCRE
+       If you set --enable-utf8 when compiling in an EBCDIC environment,  PCRE
        expects its input to be either ASCII or UTF-8 (depending on the runtime
-       option).  It  is not possible to support both EBCDIC and UTF-8 codes in
-       the same  version  of  the  library.  Consequently,  --enable-utf8  and
+       option). It is not possible to support both EBCDIC and UTF-8  codes  in
+       the  same  version  of  the  library.  Consequently,  --enable-utf8 and
        --enable-ebcdic are mutually exclusive.

UNICODE CHARACTER PROPERTY SUPPORT

-       UTF-8  support allows PCRE to process character values greater than 255
-       in the strings that it handles. On its own, however, it does  not  pro-
+       UTF-8 support allows PCRE to process character values greater than  255
+       in  the  strings that it handles. On its own, however, it does not pro-
        vide any facilities for accessing the properties of such characters. If
-       you want to be able to use the pattern escapes \P, \p,  and  \X,  which
+       you  want  to  be able to use the pattern escapes \P, \p, and \X, which
        refer to Unicode character properties, you must add

          --enable-unicode-properties

-       to  the configure command. This implies UTF-8 support, even if you have
+       to the configure command. This implies UTF-8 support, even if you  have
        not explicitly requested it.

-       Including Unicode property support adds around 30K  of  tables  to  the
-       PCRE  library.  Only  the general category properties such as Lu and Nd
+       Including  Unicode  property  support  adds around 30K of tables to the
+       PCRE library. Only the general category properties such as  Lu  and  Nd
        are supported. Details are given in the pcrepattern documentation.

CODE VALUE OF NEWLINE

-       By default, PCRE interprets the linefeed (LF) character  as  indicating
-       the  end  of  a line. This is the normal newline character on Unix-like
-       systems. You can compile PCRE to use carriage return (CR)  instead,  by
+       By  default,  PCRE interprets the linefeed (LF) character as indicating
+       the end of a line. This is the normal newline  character  on  Unix-like
+       systems.  You  can compile PCRE to use carriage return (CR) instead, by
        adding

          --enable-newline-is-cr

-       to  the  configure  command.  There  is  also  a --enable-newline-is-lf
+       to the  configure  command.  There  is  also  a  --enable-newline-is-lf
        option, which explicitly specifies linefeed as the newline character.

        Alternatively, you can specify that line endings are to be indicated by
@@ -367,35 +372,35 @@

          --enable-newline-is-anycrlf

-       which  causes  PCRE  to recognize any of the three sequences CR, LF, or
+       which causes PCRE to recognize any of the three sequences  CR,  LF,  or
        CRLF as indicating a line ending. Finally, a fifth option, specified by

          --enable-newline-is-any

        causes PCRE to recognize any Unicode newline sequence.

-       Whatever line ending convention is selected when PCRE is built  can  be
-       overridden  when  the library functions are called. At build time it is
+       Whatever  line  ending convention is selected when PCRE is built can be
+       overridden when the library functions are called. At build time  it  is
        conventional to use the standard for your operating system.

WHAT \R MATCHES

-       By default, the sequence \R in a pattern matches  any  Unicode  newline
-       sequence,  whatever  has  been selected as the line ending sequence. If
+       By  default,  the  sequence \R in a pattern matches any Unicode newline
+       sequence, whatever has been selected as the line  ending  sequence.  If
        you specify

          --enable-bsr-anycrlf

-       the default is changed so that \R matches only CR, LF, or  CRLF.  What-
-       ever  is selected when PCRE is built can be overridden when the library
+       the  default  is changed so that \R matches only CR, LF, or CRLF. What-
+       ever is selected when PCRE is built can be overridden when the  library
        functions are called.

BUILDING SHARED AND STATIC LIBRARIES

-       The PCRE building process uses libtool to build both shared and  static
-       Unix  libraries by default. You can suppress one of these by adding one
+       The  PCRE building process uses libtool to build both shared and static
+       Unix libraries by default. You can suppress one of these by adding  one
        of

          --disable-shared
@@ -407,9 +412,9 @@
 POSIX MALLOC USAGE

        When PCRE is called through the POSIX interface (see the pcreposix doc-
-       umentation),  additional  working  storage  is required for holding the
-       pointers to capturing substrings, because PCRE requires three  integers
-       per  substring,  whereas  the POSIX interface provides only two. If the
+       umentation), additional working storage is  required  for  holding  the
+       pointers  to capturing substrings, because PCRE requires three integers
+       per substring, whereas the POSIX interface provides only  two.  If  the
        number of expected substrings is small, the wrapper function uses space
        on the stack, because this is faster than using malloc() for each call.
        The default threshold above which the stack is no longer used is 10; it
@@ -422,112 +427,112 @@

HANDLING VERY LARGE PATTERNS

-       Within  a  compiled  pattern,  offset values are used to point from one
-       part to another (for example, from an opening parenthesis to an  alter-
-       nation  metacharacter).  By default, two-byte values are used for these
-       offsets, leading to a maximum size for a  compiled  pattern  of  around
-       64K.  This  is sufficient to handle all but the most gigantic patterns.
-       Nevertheless, some people do want to process enormous patterns,  so  it
-       is  possible  to compile PCRE to use three-byte or four-byte offsets by
+       Within a compiled pattern, offset values are used  to  point  from  one
+       part  to another (for example, from an opening parenthesis to an alter-
+       nation metacharacter). By default, two-byte values are used  for  these
+       offsets,  leading  to  a  maximum size for a compiled pattern of around
+       64K. This is sufficient to handle all but the most  gigantic  patterns.
+       Nevertheless,  some  people do want to process enormous patterns, so it
+       is possible to compile PCRE to use three-byte or four-byte  offsets  by
        adding a setting such as

          --with-link-size=3

-       to the configure command. The value given must be 2,  3,  or  4.  Using
-       longer  offsets slows down the operation of PCRE because it has to load
+       to  the  configure  command.  The value given must be 2, 3, or 4. Using
+       longer offsets slows down the operation of PCRE because it has to  load
        additional bytes when handling them.

AVOIDING EXCESSIVE STACK USAGE

        When matching with the pcre_exec() function, PCRE implements backtrack-
-       ing  by  making recursive calls to an internal function called match().
-       In environments where the size of the stack is limited,  this  can  se-
-       verely  limit  PCRE's operation. (The Unix environment does not usually
+       ing by making recursive calls to an internal function  called  match().
+       In  environments  where  the size of the stack is limited, this can se-
+       verely limit PCRE's operation. (The Unix environment does  not  usually
        suffer from this problem, but it may sometimes be necessary to increase
-       the  maximum  stack size.  There is a discussion in the pcrestack docu-
-       mentation.) An alternative approach to recursion that uses memory  from
-       the  heap  to remember data, instead of using recursive function calls,
-       has been implemented to work round the problem of limited  stack  size.
+       the maximum stack size.  There is a discussion in the  pcrestack  docu-
+       mentation.)  An alternative approach to recursion that uses memory from
+       the heap to remember data, instead of using recursive  function  calls,
+       has  been  implemented to work round the problem of limited stack size.
        If you want to build a version of PCRE that works this way, add

          --disable-stack-for-recursion

-       to  the  configure  command. With this configuration, PCRE will use the
-       pcre_stack_malloc and pcre_stack_free variables to call memory  manage-
-       ment  functions. By default these point to malloc() and free(), but you
+       to the configure command. With this configuration, PCRE  will  use  the
+       pcre_stack_malloc  and pcre_stack_free variables to call memory manage-
+       ment functions. By default these point to malloc() and free(), but  you
        can replace the pointers so that your own functions are used.

-       Separate functions are  provided  rather  than  using  pcre_malloc  and
-       pcre_free  because  the  usage  is  very  predictable:  the block sizes
-       requested are always the same, and  the  blocks  are  always  freed  in
-       reverse  order.  A calling program might be able to implement optimized
-       functions that perform better  than  malloc()  and  free().  PCRE  runs
+       Separate  functions  are  provided  rather  than  using pcre_malloc and
+       pcre_free because the  usage  is  very  predictable:  the  block  sizes
+       requested  are  always  the  same,  and  the blocks are always freed in
+       reverse order. A calling program might be able to  implement  optimized
+       functions  that  perform  better  than  malloc()  and free(). PCRE runs
        noticeably more slowly when built in this way. This option affects only
-       the  pcre_exec()  function;  it   is   not   relevant   for   the   the
+       the   pcre_exec()   function;   it   is   not   relevant  for  the  the
        pcre_dfa_exec() function.

LIMITING PCRE RESOURCE USAGE

-       Internally,  PCRE has a function called match(), which it calls repeat-
-       edly  (sometimes  recursively)  when  matching  a  pattern   with   the
-       pcre_exec()  function.  By controlling the maximum number of times this
-       function may be called during a single matching operation, a limit  can
-       be  placed  on  the resources used by a single call to pcre_exec(). The
-       limit can be changed at run time, as described in the pcreapi  documen-
-       tation.  The default is 10 million, but this can be changed by adding a
+       Internally, PCRE has a function called match(), which it calls  repeat-
+       edly   (sometimes   recursively)  when  matching  a  pattern  with  the
+       pcre_exec() function. By controlling the maximum number of  times  this
+       function  may be called during a single matching operation, a limit can
+       be placed on the resources used by a single call  to  pcre_exec().  The
+       limit  can be changed at run time, as described in the pcreapi documen-
+       tation. The default is 10 million, but this can be changed by adding  a
        setting such as

          --with-match-limit=500000

-       to  the  configure  command.  This  setting  has  no  effect   on   the
+       to   the   configure  command.  This  setting  has  no  effect  on  the
        pcre_dfa_exec() matching function.

-       In  some  environments  it is desirable to limit the depth of recursive
+       In some environments it is desirable to limit the  depth  of  recursive
        calls of match() more strictly than the total number of calls, in order
-       to  restrict  the maximum amount of stack (or heap, if --disable-stack-
+       to restrict the maximum amount of stack (or heap,  if  --disable-stack-
        for-recursion is specified) that is used. A second limit controls this;
-       it  defaults  to  the  value  that is set for --with-match-limit, which
-       imposes no additional constraints. However, you can set a  lower  limit
+       it defaults to the value that  is  set  for  --with-match-limit,  which
+       imposes  no  additional constraints. However, you can set a lower limit
        by adding, for example,

          --with-match-limit-recursion=10000

-       to  the  configure  command.  This  value can also be overridden at run
+       to the configure command. This value can  also  be  overridden  at  run
        time.

CREATING CHARACTER TABLES AT BUILD TIME

-       PCRE uses fixed tables for processing characters whose code values  are
-       less  than 256. By default, PCRE is built with a set of tables that are
-       distributed in the file pcre_chartables.c.dist. These  tables  are  for
+       PCRE  uses fixed tables for processing characters whose code values are
+       less than 256. By default, PCRE is built with a set of tables that  are
+       distributed  in  the  file pcre_chartables.c.dist. These tables are for
        ASCII codes only. If you add

          --enable-rebuild-chartables

-       to  the  configure  command, the distributed tables are no longer used.
-       Instead, a program called dftables is compiled and  run.  This  outputs
+       to the configure command, the distributed tables are  no  longer  used.
+       Instead,  a  program  called dftables is compiled and run. This outputs
        the source for new set of tables, created in the default locale of your
        C runtime system. (This method of replacing the tables does not work if
-       you  are cross compiling, because dftables is run on the local host. If
-       you need to create alternative tables when cross  compiling,  you  will
+       you are cross compiling, because dftables is run on the local host.  If
+       you  need  to  create alternative tables when cross compiling, you will
        have to do so "by hand".)

USING EBCDIC CODE

-       PCRE  assumes  by  default that it will run in an environment where the
-       character code is ASCII (or Unicode, which is  a  superset  of  ASCII).
-       This  is  the  case for most computer operating systems. PCRE can, how-
+       PCRE assumes by default that it will run in an  environment  where  the
+       character  code  is  ASCII  (or Unicode, which is a superset of ASCII).
+       This is the case for most computer operating systems.  PCRE  can,  how-
        ever, be compiled to run in an EBCDIC environment by adding

          --enable-ebcdic

        to the configure command. This setting implies --enable-rebuild-charta-
-       bles.  You  should  only  use  it if you know that you are in an EBCDIC
-       environment (for example,  an  IBM  mainframe  operating  system).  The
+       bles. You should only use it if you know that  you  are  in  an  EBCDIC
+       environment  (for  example,  an  IBM  mainframe  operating system). The
        --enable-ebcdic option is incompatible with --enable-utf8.

@@ -541,7 +546,7 @@
          --enable-pcregrep-libbz2

        to the configure command. These options naturally require that the rel-
-       evant libraries are installed on your system. Configuration  will  fail
+       evant  libraries  are installed on your system. Configuration will fail
        if they are not.

@@ -551,24 +556,24 @@

          --enable-pcretest-libreadline

-       to  the  configure  command,  pcretest  is  linked with the libreadline
-       library, and when its input is from a terminal, it reads it  using  the
+       to the configure command,  pcretest  is  linked  with  the  libreadline
+       library,  and  when its input is from a terminal, it reads it using the
        readline() function. This provides line-editing and history facilities.
        Note that libreadline is GPL-licenced, so if you distribute a binary of
        pcretest linked in this way, there may be licensing issues.

-       Setting  this  option  causes  the -lreadline option to be added to the
-       pcretest build. In many operating environments with  a  sytem-installed
+       Setting this option causes the -lreadline option to  be  added  to  the
+       pcretest  build.  In many operating environments with a sytem-installed
        libreadline this is sufficient. However, in some environments (e.g.  if
-       an unmodified distribution version of readline is in use),  some  extra
-       configuration  may  be necessary. The INSTALL file for libreadline says
+       an  unmodified  distribution version of readline is in use), some extra
+       configuration may be necessary. The INSTALL file for  libreadline  says
        this:

          "Readline uses the termcap functions, but does not link with the
          termcap or curses library itself, allowing applications which link
          with readline the to choose an appropriate library."

-       If your environment has not been set up so that an appropriate  library
+       If  your environment has not been set up so that an appropriate library
        is automatically included, you may need to add something like

          LIBS="-ncurses"
@@ -590,7 +595,7 @@

REVISION

-       Last updated: 17 March 2009
+       Last updated: 06 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -696,67 +701,72 @@
        at the fourth character of the subject. The algorithm does not automat-
        ically move on to find matches that start at later positions.

+       Although the general principle of this matching algorithm  is  that  it
+       scans  the subject string only once, without backtracking, there is one
+       exception: when a lookbehind assertion is  encountered,  the  preceding
+       characters have to be re-inspected.
+
        There are a number of features of PCRE regular expressions that are not
        supported by the alternative matching algorithm. They are as follows:

-       1.  Because  the  algorithm  finds  all possible matches, the greedy or
-       ungreedy nature of repetition quantifiers is not relevant.  Greedy  and
+       1. Because the algorithm finds all  possible  matches,  the  greedy  or
+       ungreedy  nature  of repetition quantifiers is not relevant. Greedy and
        ungreedy quantifiers are treated in exactly the same way. However, pos-
-       sessive quantifiers can make a difference when what follows could  also
+       sessive  quantifiers can make a difference when what follows could also
        match what is quantified, for example in a pattern like this:

          ^a++\w!

-       This  pattern matches "aaab!" but not "aaa!", which would be matched by
-       a non-possessive quantifier. Similarly, if an atomic group is  present,
-       it  is matched as if it were a standalone pattern at the current point,
-       and the longest match is then "locked in" for the rest of  the  overall
+       This pattern matches "aaab!" but not "aaa!", which would be matched  by
+       a  non-possessive quantifier. Similarly, if an atomic group is present,
+       it is matched as if it were a standalone pattern at the current  point,
+       and  the  longest match is then "locked in" for the rest of the overall
        pattern.

        2. When dealing with multiple paths through the tree simultaneously, it
-       is not straightforward to keep track of  captured  substrings  for  the
-       different  matching  possibilities,  and  PCRE's implementation of this
+       is  not  straightforward  to  keep track of captured substrings for the
+       different matching possibilities, and  PCRE's  implementation  of  this
        algorithm does not attempt to do this. This means that no captured sub-
        strings are available.

-       3.  Because no substrings are captured, back references within the pat-
+       3. Because no substrings are captured, back references within the  pat-
        tern are not supported, and cause errors if encountered.

-       4. For the same reason, conditional expressions that use  a  backrefer-
-       ence  as  the  condition or test for a specific group recursion are not
+       4.  For  the same reason, conditional expressions that use a backrefer-
+       ence as the condition or test for a specific group  recursion  are  not
        supported.

-       5. Because many paths through the tree may be  active,  the  \K  escape
+       5.  Because  many  paths  through the tree may be active, the \K escape
        sequence, which resets the start of the match when encountered (but may
-       be on some paths and not on others), is not  supported.  It  causes  an
+       be  on  some  paths  and not on others), is not supported. It causes an
        error if encountered.

-       6.  Callouts  are  supported, but the value of the capture_top field is
+       6. Callouts are supported, but the value of the  capture_top  field  is
        always 1, and the value of the capture_last field is always -1.

-       7. The \C escape sequence, which (in the standard algorithm) matches  a
-       single  byte, even in UTF-8 mode, is not supported because the alterna-
-       tive algorithm moves through the subject  string  one  character  at  a
+       7.  The \C escape sequence, which (in the standard algorithm) matches a
+       single byte, even in UTF-8 mode, is not supported because the  alterna-
+       tive  algorithm  moves  through  the  subject string one character at a
        time, for all active paths through the tree.

-       8.  Except for (*FAIL), the backtracking control verbs such as (*PRUNE)
-       are not supported. (*FAIL) is supported, and  behaves  like  a  failing
+       8. Except for (*FAIL), the backtracking control verbs such as  (*PRUNE)
+       are  not  supported.  (*FAIL)  is supported, and behaves like a failing
        negative assertion.

ADVANTAGES OF THE ALTERNATIVE ALGORITHM

-       Using  the alternative matching algorithm provides the following advan-
+       Using the alternative matching algorithm provides the following  advan-
        tages:

        1. All possible matches (at a single point in the subject) are automat-
-       ically  found,  and  in particular, the longest match is found. To find
+       ically found, and in particular, the longest match is  found.  To  find
        more than one match using the standard algorithm, you have to do kludgy
        things with callouts.

-       2.  Because  the  alternative  algorithm  scans the subject string just
-       once, and never needs to backtrack, it is possible to  pass  very  long
-       subject  strings  to  the matching function in several pieces, checking
+       2. Because the alternative algorithm  scans  the  subject  string  just
+       once,  and  never  needs to backtrack, it is possible to pass very long
+       subject strings to the matching function in  several  pieces,  checking
        for partial matching each time.

@@ -764,8 +774,8 @@

        The alternative algorithm suffers from a number of disadvantages:

-       1. It is substantially slower than  the  standard  algorithm.  This  is
-       partly  because  it has to search for all possible matches, but is also
+       1.  It  is  substantially  slower  than the standard algorithm. This is
+       partly because it has to search for all possible matches, but  is  also
        because it is less susceptible to optimization.

        2. Capturing parentheses and back references are not supported.
@@ -783,7 +793,7 @@

REVISION

-       Last updated: 25 August 2009
+       Last updated: 05 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -902,12 +912,13 @@
        A second matching function, pcre_dfa_exec(), which is not Perl-compati-
        ble, is also provided. This uses a different algorithm for  the  match-
        ing.  The  alternative algorithm finds all possible matches (at a given
-       point in the subject), and scans the subject just once.  However,  this
-       algorithm does not return captured substrings. A description of the two
-       matching algorithms and their advantages and disadvantages is given  in
-       the pcrematching documentation.
+       point in the subject), and scans the subject just  once  (unless  there
+       are  lookbehind  assertions).  However,  this algorithm does not return
+       captured substrings. A description of the two matching  algorithms  and
+       their  advantages  and disadvantages is given in the pcrematching docu-
+       mentation.

-       In  addition  to  the  main compiling and matching functions, there are
+       In addition to the main compiling and  matching  functions,  there  are
        convenience functions for extracting captured substrings from a subject
        string that is matched by pcre_exec(). They are:

@@ -922,91 +933,91 @@
        pcre_free_substring() and pcre_free_substring_list() are also provided,
        to free the memory used for extracted strings.

-       The function pcre_maketables() is used to  build  a  set  of  character
-       tables   in   the   current   locale  for  passing  to  pcre_compile(),
-       pcre_exec(), or pcre_dfa_exec(). This is an optional facility  that  is
-       provided  for  specialist  use.  Most  commonly,  no special tables are
-       passed, in which case internal tables that are generated when  PCRE  is
+       The  function  pcre_maketables()  is  used  to build a set of character
+       tables  in  the  current  locale   for   passing   to   pcre_compile(),
+       pcre_exec(),  or  pcre_dfa_exec(). This is an optional facility that is
+       provided for specialist use.  Most  commonly,  no  special  tables  are
+       passed,  in  which case internal tables that are generated when PCRE is
        built are used.

-       The  function  pcre_fullinfo()  is used to find out information about a
-       compiled pattern; pcre_info() is an obsolete version that returns  only
-       some  of  the available information, but is retained for backwards com-
-       patibility.  The function pcre_version() returns a pointer to a  string
+       The function pcre_fullinfo() is used to find out  information  about  a
+       compiled  pattern; pcre_info() is an obsolete version that returns only
+       some of the available information, but is retained for  backwards  com-
+       patibility.   The function pcre_version() returns a pointer to a string
        containing the version of PCRE and its date of release.

-       The  function  pcre_refcount()  maintains  a  reference count in a data
-       block containing a compiled pattern. This is provided for  the  benefit
+       The function pcre_refcount() maintains a  reference  count  in  a  data
+       block  containing  a compiled pattern. This is provided for the benefit
        of object-oriented applications.

-       The  global  variables  pcre_malloc and pcre_free initially contain the
-       entry points of the standard malloc()  and  free()  functions,  respec-
+       The global variables pcre_malloc and pcre_free  initially  contain  the
+       entry  points  of  the  standard malloc() and free() functions, respec-
        tively. PCRE calls the memory management functions via these variables,
-       so a calling program can replace them if it  wishes  to  intercept  the
+       so  a  calling  program  can replace them if it wishes to intercept the
        calls. This should be done before calling any PCRE functions.

-       The  global  variables  pcre_stack_malloc  and pcre_stack_free are also
-       indirections to memory management functions.  These  special  functions
-       are  used  only  when  PCRE is compiled to use the heap for remembering
+       The global variables pcre_stack_malloc  and  pcre_stack_free  are  also
+       indirections  to  memory  management functions. These special functions
+       are used only when PCRE is compiled to use  the  heap  for  remembering
        data, instead of recursive function calls, when running the pcre_exec()
-       function.  See  the  pcrebuild  documentation  for details of how to do
-       this. It is a non-standard way of building PCRE, for  use  in  environ-
-       ments  that  have  limited stacks. Because of the greater use of memory
-       management, it runs more slowly. Separate  functions  are  provided  so
-       that  special-purpose  external  code  can  be used for this case. When
-       used, these functions are always called in a  stack-like  manner  (last
-       obtained,  first freed), and always for memory blocks of the same size.
-       There is a discussion about PCRE's stack usage in the  pcrestack  docu-
+       function. See the pcrebuild documentation for  details  of  how  to  do
+       this.  It  is  a non-standard way of building PCRE, for use in environ-
+       ments that have limited stacks. Because of the greater  use  of  memory
+       management,  it  runs  more  slowly. Separate functions are provided so
+       that special-purpose external code can be  used  for  this  case.  When
+       used,  these  functions  are always called in a stack-like manner (last
+       obtained, first freed), and always for memory blocks of the same  size.
+       There  is  a discussion about PCRE's stack usage in the pcrestack docu-
        mentation.

        The global variable pcre_callout initially contains NULL. It can be set
-       by the caller to a "callout" function, which PCRE  will  then  call  at
-       specified  points during a matching operation. Details are given in the
+       by  the  caller  to  a "callout" function, which PCRE will then call at
+       specified points during a matching operation. Details are given in  the
        pcrecallout documentation.

NEWLINES

-       PCRE supports five different conventions for indicating line breaks  in
-       strings:  a  single  CR (carriage return) character, a single LF (line-
+       PCRE  supports five different conventions for indicating line breaks in
+       strings: a single CR (carriage return) character, a  single  LF  (line-
        feed) character, the two-character sequence CRLF, any of the three pre-
-       ceding,  or any Unicode newline sequence. The Unicode newline sequences
-       are the three just mentioned, plus the single characters  VT  (vertical
-       tab,  U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
+       ceding, or any Unicode newline sequence. The Unicode newline  sequences
+       are  the  three just mentioned, plus the single characters VT (vertical
+       tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS  (line
        separator, U+2028), and PS (paragraph separator, U+2029).

-       Each of the first three conventions is used by at least  one  operating
-       system  as its standard newline sequence. When PCRE is built, a default
-       can be specified.  The default default is LF, which is the  Unix  stan-
-       dard.  When  PCRE  is run, the default can be overridden, either when a
+       Each  of  the first three conventions is used by at least one operating
+       system as its standard newline sequence. When PCRE is built, a  default
+       can  be  specified.  The default default is LF, which is the Unix stan-
+       dard. When PCRE is run, the default can be overridden,  either  when  a
        pattern is compiled, or when it is matched.

        At compile time, the newline convention can be specified by the options
-       argument  of  pcre_compile(), or it can be specified by special text at
+       argument of pcre_compile(), or it can be specified by special  text  at
        the start of the pattern itself; this overrides any other settings. See
        the pcrepattern page for details of the special character sequences.

        In the PCRE documentation the word "newline" is used to mean "the char-
-       acter or pair of characters that indicate a line break". The choice  of
-       newline  convention  affects  the  handling of the dot, circumflex, and
+       acter  or pair of characters that indicate a line break". The choice of
+       newline convention affects the handling of  the  dot,  circumflex,  and
        dollar metacharacters, the handling of #-comments in /x mode, and, when
-       CRLF  is a recognized line ending sequence, the match position advance-
+       CRLF is a recognized line ending sequence, the match position  advance-
        ment for a non-anchored pattern. There is more detail about this in the
        section on pcre_exec() options below.

-       The  choice of newline convention does not affect the interpretation of
-       the \n or \r escape sequences, nor does  it  affect  what  \R  matches,
+       The choice of newline convention does not affect the interpretation  of
+       the  \n  or  \r  escape  sequences, nor does it affect what \R matches,
        which is controlled in a similar way, but by separate options.

MULTITHREADING

-       The  PCRE  functions  can be used in multi-threading applications, with
+       The PCRE functions can be used in  multi-threading  applications,  with
        the  proviso  that  the  memory  management  functions  pointed  to  by
        pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the
        callout function pointed to by pcre_callout, are shared by all threads.

-       The compiled form of a regular expression is not altered during  match-
+       The  compiled form of a regular expression is not altered during match-
        ing, so the same compiled pattern can safely be used by several threads
        at once.

@@ -1014,10 +1025,10 @@
SAVING PRECOMPILED PATTERNS FOR LATER USE

        The compiled form of a regular expression can be saved and re-used at a
-       later  time,  possibly by a different program, and even on a host other
-       than the one on which  it  was  compiled.  Details  are  given  in  the
-       pcreprecompile  documentation.  However, compiling a regular expression
-       with one version of PCRE for use with a different version is not  guar-
+       later time, possibly by a different program, and even on a  host  other
+       than  the  one  on  which  it  was  compiled.  Details are given in the
+       pcreprecompile documentation. However, compiling a  regular  expression
+       with  one version of PCRE for use with a different version is not guar-
        anteed to work and may cause crashes.

@@ -1025,79 +1036,79 @@

        int pcre_config(int what, void *where);

-       The  function pcre_config() makes it possible for a PCRE client to dis-
+       The function pcre_config() makes it possible for a PCRE client to  dis-
        cover which optional features have been compiled into the PCRE library.
-       The  pcrebuild documentation has more details about these optional fea-
+       The pcrebuild documentation has more details about these optional  fea-
        tures.

-       The first argument for pcre_config() is an  integer,  specifying  which
+       The  first  argument  for pcre_config() is an integer, specifying which
        information is required; the second argument is a pointer to a variable
-       into which the information is  placed.  The  following  information  is
+       into  which  the  information  is  placed. The following information is
        available:

          PCRE_CONFIG_UTF8

-       The  output is an integer that is set to one if UTF-8 support is avail-
+       The output is an integer that is set to one if UTF-8 support is  avail-
        able; otherwise it is set to zero.

          PCRE_CONFIG_UNICODE_PROPERTIES

-       The output is an integer that is set to  one  if  support  for  Unicode
+       The  output  is  an  integer  that is set to one if support for Unicode
        character properties is available; otherwise it is set to zero.

          PCRE_CONFIG_NEWLINE

-       The  output  is  an integer whose value specifies the default character
-       sequence that is recognized as meaning "newline". The four values  that
+       The output is an integer whose value specifies  the  default  character
+       sequence  that is recognized as meaning "newline". The four values that
        are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
-       and -1 for ANY.  Though they are derived from ASCII,  the  same  values
+       and  -1  for  ANY.  Though they are derived from ASCII, the same values
        are returned in EBCDIC environments. The default should normally corre-
        spond to the standard sequence for your operating system.

          PCRE_CONFIG_BSR

        The output is an integer whose value indicates what character sequences
-       the  \R  escape sequence matches by default. A value of 0 means that \R
-       matches any Unicode line ending sequence; a value of 1  means  that  \R
+       the \R escape sequence matches by default. A value of 0 means  that  \R
+       matches  any  Unicode  line ending sequence; a value of 1 means that \R
        matches only CR, LF, or CRLF. The default can be overridden when a pat-
        tern is compiled or matched.

          PCRE_CONFIG_LINK_SIZE

-       The output is an integer that contains the number  of  bytes  used  for
+       The  output  is  an  integer that contains the number of bytes used for
        internal linkage in compiled regular expressions. The value is 2, 3, or
-       4. Larger values allow larger regular expressions to  be  compiled,  at
-       the  expense  of  slower matching. The default value of 2 is sufficient
-       for all but the most massive patterns, since  it  allows  the  compiled
+       4.  Larger  values  allow larger regular expressions to be compiled, at
+       the expense of slower matching. The default value of  2  is  sufficient
+       for  all  but  the  most massive patterns, since it allows the compiled
        pattern to be up to 64K in size.

          PCRE_CONFIG_POSIX_MALLOC_THRESHOLD

-       The  output  is  an integer that contains the threshold above which the
-       POSIX interface uses malloc() for output vectors. Further  details  are
+       The output is an integer that contains the threshold  above  which  the
+       POSIX  interface  uses malloc() for output vectors. Further details are
        given in the pcreposix documentation.

          PCRE_CONFIG_MATCH_LIMIT

-       The  output is a long integer that gives the default limit for the num-
-       ber of internal matching function calls  in  a  pcre_exec()  execution.
+       The output is a long integer that gives the default limit for the  num-
+       ber  of  internal  matching  function calls in a pcre_exec() execution.
        Further details are given with pcre_exec() below.

          PCRE_CONFIG_MATCH_LIMIT_RECURSION

        The output is a long integer that gives the default limit for the depth
-       of  recursion  when  calling  the  internal  matching  function  in   a
-       pcre_exec()  execution.  Further  details  are  given  with pcre_exec()
+       of   recursion  when  calling  the  internal  matching  function  in  a
+       pcre_exec() execution.  Further  details  are  given  with  pcre_exec()
        below.

          PCRE_CONFIG_STACKRECURSE

-       The output is an integer that is set to one if internal recursion  when
+       The  output is an integer that is set to one if internal recursion when
        running pcre_exec() is implemented by recursive function calls that use
-       the stack to remember their state. This is the usual way that  PCRE  is
+       the  stack  to remember their state. This is the usual way that PCRE is
        compiled. The output is zero if PCRE was compiled to use blocks of data
-       on the  heap  instead  of  recursive  function  calls.  In  this  case,
-       pcre_stack_malloc  and  pcre_stack_free  are  called  to  manage memory
+       on  the  heap  instead  of  recursive  function  calls.  In  this case,
+       pcre_stack_malloc and  pcre_stack_free  are  called  to  manage  memory
        blocks on the heap, thus avoiding the use of the stack.

@@ -1114,56 +1125,56 @@

        Either of the functions pcre_compile() or pcre_compile2() can be called
        to compile a pattern into an internal form. The only difference between
-       the two interfaces is that pcre_compile2() has an additional  argument,
+       the  two interfaces is that pcre_compile2() has an additional argument,
        errorcodeptr, via which a numerical error code can be returned.

        The pattern is a C string terminated by a binary zero, and is passed in
-       the pattern argument. A pointer to a single block  of  memory  that  is
-       obtained  via  pcre_malloc is returned. This contains the compiled code
+       the  pattern  argument.  A  pointer to a single block of memory that is
+       obtained via pcre_malloc is returned. This contains the  compiled  code
        and related data. The pcre type is defined for the returned block; this
        is a typedef for a structure whose contents are not externally defined.
        It is up to the caller to free the memory (via pcre_free) when it is no
        longer required.

-       Although  the compiled code of a PCRE regex is relocatable, that is, it
+       Although the compiled code of a PCRE regex is relocatable, that is,  it
        does not depend on memory location, the complete pcre data block is not
-       fully  relocatable, because it may contain a copy of the tableptr argu-
+       fully relocatable, because it may contain a copy of the tableptr  argu-
        ment, which is an address (see below).

        The options argument contains various bit settings that affect the com-
-       pilation.  It  should be zero if no options are required. The available
-       options are described below. Some of them (in  particular,  those  that
-       are  compatible  with  Perl,  but also some others) can also be set and
-       unset from within the pattern (see  the  detailed  description  in  the
-       pcrepattern  documentation). For those options that can be different in
-       different parts of the pattern, the contents of  the  options  argument
+       pilation. It should be zero if no options are required.  The  available
+       options  are  described  below. Some of them (in particular, those that
+       are compatible with Perl, but also some others) can  also  be  set  and
+       unset  from  within  the  pattern  (see the detailed description in the
+       pcrepattern documentation). For those options that can be different  in
+       different  parts  of  the pattern, the contents of the options argument
        specifies their initial settings at the start of compilation and execu-
-       tion. The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at  the
+       tion.  The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the
        time of matching as well as at compile time.

        If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
-       if compilation of a pattern fails,  pcre_compile()  returns  NULL,  and
+       if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
        sets the variable pointed to by errptr to point to a textual error mes-
        sage. This is a static string that is part of the library. You must not
        try to free it. The offset from the start of the pattern to the charac-
        ter where the error was discovered is placed in the variable pointed to
-       by  erroffset,  which must not be NULL. If it is, an immediate error is
+       by erroffset, which must not be NULL. If it is, an immediate  error  is
        given.

-       If pcre_compile2() is used instead of pcre_compile(),  and  the  error-
-       codeptr  argument is not NULL, a non-zero error code number is returned
-       via this argument in the event of an error. This is in addition to  the
+       If  pcre_compile2()  is  used instead of pcre_compile(), and the error-
+       codeptr argument is not NULL, a non-zero error code number is  returned
+       via  this argument in the event of an error. This is in addition to the
        textual error message. Error codes and messages are listed below.

-       If  the  final  argument, tableptr, is NULL, PCRE uses a default set of
-       character tables that are  built  when  PCRE  is  compiled,  using  the
-       default  C  locale.  Otherwise, tableptr must be an address that is the
-       result of a call to pcre_maketables(). This value is  stored  with  the
-       compiled  pattern,  and used again by pcre_exec(), unless another table
+       If the final argument, tableptr, is NULL, PCRE uses a  default  set  of
+       character  tables  that  are  built  when  PCRE  is compiled, using the
+       default C locale. Otherwise, tableptr must be an address  that  is  the
+       result  of  a  call to pcre_maketables(). This value is stored with the
+       compiled pattern, and used again by pcre_exec(), unless  another  table
        pointer is passed to it. For more discussion, see the section on locale
        support below.

-       This  code  fragment  shows a typical straightforward call to pcre_com-
+       This code fragment shows a typical straightforward  call  to  pcre_com-
        pile():

          pcre *re;
@@ -1176,137 +1187,137 @@
            &erroffset,       /* for error offset */
            NULL);            /* use default character tables */

-       The following names for option bits are defined in  the  pcre.h  header
+       The  following  names  for option bits are defined in the pcre.h header
        file:

          PCRE_ANCHORED

        If this bit is set, the pattern is forced to be "anchored", that is, it
-       is constrained to match only at the first matching point in the  string
-       that  is being searched (the "subject string"). This effect can also be
-       achieved by appropriate constructs in the pattern itself, which is  the
+       is  constrained to match only at the first matching point in the string
+       that is being searched (the "subject string"). This effect can also  be
+       achieved  by appropriate constructs in the pattern itself, which is the
        only way to do it in Perl.

          PCRE_AUTO_CALLOUT

        If this bit is set, pcre_compile() automatically inserts callout items,
-       all with number 255, before each pattern item. For  discussion  of  the
+       all  with  number  255, before each pattern item. For discussion of the
        callout facility, see the pcrecallout documentation.

          PCRE_BSR_ANYCRLF
          PCRE_BSR_UNICODE

        These options (which are mutually exclusive) control what the \R escape
-       sequence matches. The choice is either to match only CR, LF,  or  CRLF,
+       sequence  matches.  The choice is either to match only CR, LF, or CRLF,
        or to match any Unicode newline sequence. The default is specified when
        PCRE is built. It can be overridden from within the pattern, or by set-
        ting an option when a compiled pattern is matched.

          PCRE_CASELESS

-       If  this  bit is set, letters in the pattern match both upper and lower
-       case letters. It is equivalent to Perl's  /i  option,  and  it  can  be
-       changed  within a pattern by a (?i) option setting. In UTF-8 mode, PCRE
-       always understands the concept of case for characters whose values  are
-       less  than 128, so caseless matching is always possible. For characters
-       with higher values, the concept of case is supported if  PCRE  is  com-
-       piled  with Unicode property support, but not otherwise. If you want to
-       use caseless matching for characters 128 and  above,  you  must  ensure
-       that  PCRE  is  compiled  with Unicode property support as well as with
+       If this bit is set, letters in the pattern match both upper  and  lower
+       case  letters.  It  is  equivalent  to  Perl's /i option, and it can be
+       changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE
+       always  understands the concept of case for characters whose values are
+       less than 128, so caseless matching is always possible. For  characters
+       with  higher  values,  the concept of case is supported if PCRE is com-
+       piled with Unicode property support, but not otherwise. If you want  to
+       use  caseless  matching  for  characters 128 and above, you must ensure
+       that PCRE is compiled with Unicode property support  as  well  as  with
        UTF-8 support.

          PCRE_DOLLAR_ENDONLY

-       If this bit is set, a dollar metacharacter in the pattern matches  only
-       at  the  end  of the subject string. Without this option, a dollar also
-       matches immediately before a newline at the end of the string (but  not
-       before  any  other newlines). The PCRE_DOLLAR_ENDONLY option is ignored
-       if PCRE_MULTILINE is set.  There is no equivalent  to  this  option  in
+       If  this bit is set, a dollar metacharacter in the pattern matches only
+       at the end of the subject string. Without this option,  a  dollar  also
+       matches  immediately before a newline at the end of the string (but not
+       before any other newlines). The PCRE_DOLLAR_ENDONLY option  is  ignored
+       if  PCRE_MULTILINE  is  set.   There is no equivalent to this option in
        Perl, and no way to set it within a pattern.

          PCRE_DOTALL

        If this bit is set, a dot metacharater in the pattern matches all char-
-       acters, including those that indicate newline. Without it, a  dot  does
-       not  match  when  the  current position is at a newline. This option is
-       equivalent to Perl's /s option, and it can be changed within a  pattern
-       by  a (?s) option setting. A negative class such as [^a] always matches
+       acters,  including  those that indicate newline. Without it, a dot does
+       not match when the current position is at a  newline.  This  option  is
+       equivalent  to Perl's /s option, and it can be changed within a pattern
+       by a (?s) option setting. A negative class such as [^a] always  matches
        newline characters, independent of the setting of this option.

          PCRE_DUPNAMES

-       If this bit is set, names used to identify capturing  subpatterns  need
+       If  this  bit is set, names used to identify capturing subpatterns need
        not be unique. This can be helpful for certain types of pattern when it
-       is known that only one instance of the named  subpattern  can  ever  be
-       matched.  There  are  more details of named subpatterns below; see also
+       is  known  that  only  one instance of the named subpattern can ever be
+       matched. There are more details of named subpatterns  below;  see  also
        the pcrepattern documentation.

          PCRE_EXTENDED

-       If this bit is set, whitespace  data  characters  in  the  pattern  are
+       If  this  bit  is  set,  whitespace  data characters in the pattern are
        totally ignored except when escaped or inside a character class. White-
        space does not include the VT character (code 11). In addition, charac-
        ters between an unescaped # outside a character class and the next new-
-       line, inclusive, are also ignored. This  is  equivalent  to  Perl's  /x
-       option,  and  it  can be changed within a pattern by a (?x) option set-
+       line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x
+       option, and it can be changed within a pattern by a  (?x)  option  set-
        ting.

-       This option makes it possible to include  comments  inside  complicated
-       patterns.   Note,  however,  that this applies only to data characters.
-       Whitespace  characters  may  never  appear  within  special   character
-       sequences  in  a  pattern,  for  example  within the sequence (?( which
+       This  option  makes  it possible to include comments inside complicated
+       patterns.  Note, however, that this applies only  to  data  characters.
+       Whitespace   characters  may  never  appear  within  special  character
+       sequences in a pattern, for  example  within  the  sequence  (?(  which
        introduces a conditional subpattern.

          PCRE_EXTRA

-       This option was invented in order to turn on  additional  functionality
-       of  PCRE  that  is  incompatible with Perl, but it is currently of very
-       little use. When set, any backslash in a pattern that is followed by  a
-       letter  that  has  no  special  meaning causes an error, thus reserving
-       these combinations for future expansion. By  default,  as  in  Perl,  a
-       backslash  followed by a letter with no special meaning is treated as a
-       literal. (Perl can, however, be persuaded to give a warning for  this.)
-       There  are  at  present no other features controlled by this option. It
+       This  option  was invented in order to turn on additional functionality
+       of PCRE that is incompatible with Perl, but it  is  currently  of  very
+       little  use. When set, any backslash in a pattern that is followed by a
+       letter that has no special meaning  causes  an  error,  thus  reserving
+       these  combinations  for  future  expansion.  By default, as in Perl, a
+       backslash followed by a letter with no special meaning is treated as  a
+       literal.  (Perl can, however, be persuaded to give a warning for this.)
+       There are at present no other features controlled by  this  option.  It
        can also be set by a (?X) option setting within a pattern.

          PCRE_FIRSTLINE

-       If this option is set, an  unanchored  pattern  is  required  to  match
-       before  or  at  the  first  newline  in  the subject string, though the
+       If  this  option  is  set,  an  unanchored pattern is required to match
+       before or at the first  newline  in  the  subject  string,  though  the
        matched text may continue over the newline.

          PCRE_JAVASCRIPT_COMPAT

        If this option is set, PCRE's behaviour is changed in some ways so that
-       it  is  compatible with JavaScript rather than Perl. The changes are as
+       it is compatible with JavaScript rather than Perl. The changes  are  as
        follows:

-       (1) A lone closing square bracket in a pattern  causes  a  compile-time
-       error,  because this is illegal in JavaScript (by default it is treated
+       (1)  A  lone  closing square bracket in a pattern causes a compile-time
+       error, because this is illegal in JavaScript (by default it is  treated
        as a data character). Thus, the pattern AB]CD becomes illegal when this
        option is set.

-       (2)  At run time, a back reference to an unset subpattern group matches
-       an empty string (by default this causes the current  matching  alterna-
-       tive  to  fail). A pattern such as (\1)(a) succeeds when this option is
-       set (assuming it can find an "a" in the subject), whereas it  fails  by
+       (2) At run time, a back reference to an unset subpattern group  matches
+       an  empty  string (by default this causes the current matching alterna-
+       tive to fail). A pattern such as (\1)(a) succeeds when this  option  is
+       set  (assuming  it can find an "a" in the subject), whereas it fails by
        default, for Perl compatibility.

          PCRE_MULTILINE

-       By  default,  PCRE  treats the subject string as consisting of a single
-       line of characters (even if it actually contains newlines). The  "start
-       of  line"  metacharacter  (^)  matches only at the start of the string,
-       while the "end of line" metacharacter ($) matches only at  the  end  of
+       By default, PCRE treats the subject string as consisting  of  a  single
+       line  of characters (even if it actually contains newlines). The "start
+       of line" metacharacter (^) matches only at the  start  of  the  string,
+       while  the  "end  of line" metacharacter ($) matches only at the end of
        the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
        is set). This is the same as Perl.

-       When PCRE_MULTILINE it is set, the "start of line" and  "end  of  line"
-       constructs  match  immediately following or immediately before internal
-       newlines in the subject string, respectively, as well as  at  the  very
-       start  and  end.  This is equivalent to Perl's /m option, and it can be
+       When  PCRE_MULTILINE  it  is set, the "start of line" and "end of line"
+       constructs match immediately following or immediately  before  internal
+       newlines  in  the  subject string, respectively, as well as at the very
+       start and end. This is equivalent to Perl's /m option, and  it  can  be
        changed within a pattern by a (?m) option setting. If there are no new-
-       lines  in  a  subject string, or no occurrences of ^ or $ in a pattern,
+       lines in a subject string, or no occurrences of ^ or $  in  a  pattern,
        setting PCRE_MULTILINE has no effect.

          PCRE_NEWLINE_CR
@@ -1315,32 +1326,32 @@
          PCRE_NEWLINE_ANYCRLF
          PCRE_NEWLINE_ANY

-       These options override the default newline definition that  was  chosen
-       when  PCRE  was built. Setting the first or the second specifies that a
-       newline is indicated by a single character (CR  or  LF,  respectively).
-       Setting  PCRE_NEWLINE_CRLF specifies that a newline is indicated by the
-       two-character CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF  specifies
+       These  options  override the default newline definition that was chosen
+       when PCRE was built. Setting the first or the second specifies  that  a
+       newline  is  indicated  by a single character (CR or LF, respectively).
+       Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the
+       two-character  CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF specifies
        that any of the three preceding sequences should be recognized. Setting
-       PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should  be
+       PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be
        recognized. The Unicode newline sequences are the three just mentioned,
-       plus the single characters VT (vertical  tab,  U+000B),  FF  (formfeed,
-       U+000C),  NEL  (next line, U+0085), LS (line separator, U+2028), and PS
-       (paragraph separator, U+2029). The last  two  are  recognized  only  in
+       plus  the  single  characters  VT (vertical tab, U+000B), FF (formfeed,
+       U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS
+       (paragraph  separator,  U+2029).  The  last  two are recognized only in
        UTF-8 mode.

-       The  newline  setting  in  the  options  word  uses three bits that are
+       The newline setting in the  options  word  uses  three  bits  that  are
        treated as a number, giving eight possibilities. Currently only six are
-       used  (default  plus the five values above). This means that if you set
-       more than one newline option, the combination may or may not be  sensi-
+       used (default plus the five values above). This means that if  you  set
+       more  than one newline option, the combination may or may not be sensi-
        ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
-       PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers  and
+       PCRE_NEWLINE_CRLF,  but other combinations may yield unused numbers and
        cause an error.

-       The  only time that a line break is specially recognized when compiling
-       a pattern is if PCRE_EXTENDED is set, and  an  unescaped  #  outside  a
-       character  class  is  encountered.  This indicates a comment that lasts
-       until after the next line break sequence. In other circumstances,  line
-       break   sequences   are   treated  as  literal  data,  except  that  in
+       The only time that a line break is specially recognized when  compiling
+       a  pattern  is  if  PCRE_EXTENDED  is set, and an unescaped # outside a
+       character class is encountered. This indicates  a  comment  that  lasts
+       until  after the next line break sequence. In other circumstances, line
+       break  sequences  are  treated  as  literal  data,   except   that   in
        PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters
        and are therefore ignored.

@@ -1350,46 +1361,46 @@
          PCRE_NO_AUTO_CAPTURE

        If this option is set, it disables the use of numbered capturing paren-
-       theses  in the pattern. Any opening parenthesis that is not followed by
-       ? behaves as if it were followed by ?: but named parentheses can  still
-       be  used  for  capturing  (and  they acquire numbers in the usual way).
+       theses in the pattern. Any opening parenthesis that is not followed  by
+       ?  behaves as if it were followed by ?: but named parentheses can still
+       be used for capturing (and they acquire  numbers  in  the  usual  way).
        There is no equivalent of this option in Perl.

          PCRE_UNGREEDY

-       This option inverts the "greediness" of the quantifiers  so  that  they
-       are  not greedy by default, but become greedy if followed by "?". It is
-       not compatible with Perl. It can also be set by a (?U)  option  setting
+       This  option  inverts  the "greediness" of the quantifiers so that they
+       are not greedy by default, but become greedy if followed by "?". It  is
+       not  compatible  with Perl. It can also be set by a (?U) option setting
        within the pattern.

          PCRE_UTF8

-       This  option  causes PCRE to regard both the pattern and the subject as
-       strings of UTF-8 characters instead of single-byte  character  strings.
-       However,  it is available only when PCRE is built to include UTF-8 sup-
-       port. If not, the use of this option provokes an error. Details of  how
-       this  option  changes the behaviour of PCRE are given in the section on
+       This option causes PCRE to regard both the pattern and the  subject  as
+       strings  of  UTF-8 characters instead of single-byte character strings.
+       However, it is available only when PCRE is built to include UTF-8  sup-
+       port.  If not, the use of this option provokes an error. Details of how
+       this option changes the behaviour of PCRE are given in the  section  on
        UTF-8 support in the main pcre page.

          PCRE_NO_UTF8_CHECK

        When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
-       automatically  checked.  There  is  a  discussion about the validity of
-       UTF-8 strings in the main pcre page. If an invalid  UTF-8  sequence  of
-       bytes  is  found,  pcre_compile() returns an error. If you already know
+       automatically checked. There is a  discussion  about  the  validity  of
+       UTF-8  strings  in  the main pcre page. If an invalid UTF-8 sequence of
+       bytes is found, pcre_compile() returns an error. If  you  already  know
        that your pattern is valid, and you want to skip this check for perfor-
-       mance  reasons,  you  can set the PCRE_NO_UTF8_CHECK option. When it is
-       set, the effect of passing an invalid UTF-8  string  as  a  pattern  is
-       undefined.  It  may  cause your program to crash. Note that this option
-       can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress  the
+       mance reasons, you can set the PCRE_NO_UTF8_CHECK option.  When  it  is
+       set,  the  effect  of  passing  an invalid UTF-8 string as a pattern is
+       undefined. It may cause your program to crash. Note  that  this  option
+       can  also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the
        UTF-8 validity checking of subject strings.

COMPILATION ERROR CODES

-       The  following  table  lists  the  error  codes than may be returned by
-       pcre_compile2(), along with the error messages that may be returned  by
-       both  compiling functions. As PCRE has developed, some error codes have
+       The following table lists the error  codes  than  may  be  returned  by
+       pcre_compile2(),  along with the error messages that may be returned by
+       both compiling functions. As PCRE has developed, some error codes  have
        fallen out of use. To avoid confusion, they have not been re-used.

           0  no error
@@ -1445,7 +1456,7 @@
          50  [this code is not in use]
          51  octal value is greater than \377 (not in UTF-8 mode)
          52  internal error: overran compiling workspace
-         53  internal  error:  previously-checked  referenced  subpattern  not
+         53   internal  error:  previously-checked  referenced  subpattern not
        found
          54  DEFINE group contains more than one branch
          55  repeating a DEFINE group is not allowed
@@ -1460,7 +1471,7 @@
          63  digit expected after (?+
          64  ] is an invalid data character in JavaScript compatibility mode

-       The  numbers  32  and 10000 in errors 48 and 49 are defaults; different
+       The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different
        values may be used if the limits were changed when PCRE was built.

@@ -1469,32 +1480,32 @@
        pcre_extra *pcre_study(const pcre *code, int options
             const char **errptr);

-       If a compiled pattern is going to be used several times,  it  is  worth
+       If  a  compiled  pattern is going to be used several times, it is worth
        spending more time analyzing it in order to speed up the time taken for
-       matching. The function pcre_study() takes a pointer to a compiled  pat-
+       matching.  The function pcre_study() takes a pointer to a compiled pat-
        tern as its first argument. If studying the pattern produces additional
-       information that will help speed up matching,  pcre_study()  returns  a
-       pointer  to a pcre_extra block, in which the study_data field points to
+       information  that  will  help speed up matching, pcre_study() returns a
+       pointer to a pcre_extra block, in which the study_data field points  to
        the results of the study.

        The  returned  value  from  pcre_study()  can  be  passed  directly  to
-       pcre_exec().  However,  a  pcre_extra  block also contains other fields
-       that can be set by the caller before the block  is  passed;  these  are
+       pcre_exec(). However, a pcre_extra block  also  contains  other  fields
+       that  can  be  set  by the caller before the block is passed; these are
        described below in the section on matching a pattern.

-       If  studying  the  pattern  does not produce any additional information
+       If studying the pattern does not  produce  any  additional  information
        pcre_study() returns NULL. In that circumstance, if the calling program
-       wants  to  pass  any of the other fields to pcre_exec(), it must set up
+       wants to pass any of the other fields to pcre_exec(), it  must  set  up
        its own pcre_extra block.

-       The second argument of pcre_study() contains option bits.  At  present,
+       The  second  argument of pcre_study() contains option bits. At present,
        no options are defined, and this argument should always be zero.

-       The  third argument for pcre_study() is a pointer for an error message.
-       If studying succeeds (even if no data is  returned),  the  variable  it
-       points  to  is  set  to NULL. Otherwise it is set to point to a textual
+       The third argument for pcre_study() is a pointer for an error  message.
+       If  studying  succeeds  (even  if no data is returned), the variable it
+       points to is set to NULL. Otherwise it is set to  point  to  a  textual
        error message. This is a static string that is part of the library. You
-       must  not  try  to  free it. You should test the error pointer for NULL
+       must not try to free it. You should test the  error  pointer  for  NULL
        after calling pcre_study(), to be sure that it has run successfully.

        This is a typical call to pcre_study():
@@ -1506,62 +1517,62 @@
            &error);        /* set to NULL or points to a message */

        At present, studying a pattern is useful only for non-anchored patterns
-       that  do not have a single fixed starting character. A bitmap of possi-
+       that do not have a single fixed starting character. A bitmap of  possi-
        ble starting bytes is created.

LOCALE SUPPORT

-       PCRE handles caseless matching, and determines whether  characters  are
-       letters,  digits, or whatever, by reference to a set of tables, indexed
-       by character value. When running in UTF-8 mode, this  applies  only  to
-       characters  with  codes  less than 128. Higher-valued codes never match
-       escapes such as \w or \d, but can be tested with \p if  PCRE  is  built
-       with  Unicode  character property support. The use of locales with Uni-
-       code is discouraged. If you are handling characters with codes  greater
-       than  128, you should either use UTF-8 and Unicode, or use locales, but
+       PCRE  handles  caseless matching, and determines whether characters are
+       letters, digits, or whatever, by reference to a set of tables,  indexed
+       by  character  value.  When running in UTF-8 mode, this applies only to
+       characters with codes less than 128. Higher-valued  codes  never  match
+       escapes  such  as  \w or \d, but can be tested with \p if PCRE is built
+       with Unicode character property support. The use of locales  with  Uni-
+       code  is discouraged. If you are handling characters with codes greater
+       than 128, you should either use UTF-8 and Unicode, or use locales,  but
        not try to mix the two.

-       PCRE contains an internal set of tables that are used  when  the  final
-       argument  of  pcre_compile()  is  NULL.  These  are sufficient for many
+       PCRE  contains  an  internal set of tables that are used when the final
+       argument of pcre_compile() is  NULL.  These  are  sufficient  for  many
        applications.  Normally, the internal tables recognize only ASCII char-
        acters. However, when PCRE is built, it is possible to cause the inter-
        nal tables to be rebuilt in the default "C" locale of the local system,
        which may cause them to be different.

-       The  internal tables can always be overridden by tables supplied by the
+       The internal tables can always be overridden by tables supplied by  the
        application that calls PCRE. These may be created in a different locale
-       from  the  default.  As more and more applications change to using Uni-
+       from the default. As more and more applications change  to  using  Uni-
        code, the need for this locale support is expected to die away.

-       External tables are built by calling  the  pcre_maketables()  function,
-       which  has no arguments, in the relevant locale. The result can then be
-       passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
-       example,  to  build  and use tables that are appropriate for the French
-       locale (where accented characters with  values  greater  than  128  are
+       External  tables  are  built by calling the pcre_maketables() function,
+       which has no arguments, in the relevant locale. The result can then  be
+       passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For
+       example, to build and use tables that are appropriate  for  the  French
+       locale  (where  accented  characters  with  values greater than 128 are
        treated as letters), the following code could be used:

          setlocale(LC_CTYPE, "fr_FR");
          tables = pcre_maketables();
          re = pcre_compile(..., tables);

-       The  locale  name "fr_FR" is used on Linux and other Unix-like systems;
+       The locale name "fr_FR" is used on Linux and other  Unix-like  systems;
        if you are using Windows, the name for the French locale is "french".

-       When pcre_maketables() runs, the tables are built  in  memory  that  is
-       obtained  via  pcre_malloc. It is the caller's responsibility to ensure
-       that the memory containing the tables remains available for as long  as
+       When  pcre_maketables()  runs,  the  tables are built in memory that is
+       obtained via pcre_malloc. It is the caller's responsibility  to  ensure
+       that  the memory containing the tables remains available for as long as
        it is needed.

        The pointer that is passed to pcre_compile() is saved with the compiled
-       pattern, and the same tables are used via this pointer by  pcre_study()
+       pattern,  and the same tables are used via this pointer by pcre_study()
        and normally also by pcre_exec(). Thus, by default, for any single pat-
        tern, compilation, studying and matching all happen in the same locale,
        but different patterns can be compiled in different locales.

-       It  is  possible to pass a table pointer or NULL (indicating the use of
-       the internal tables) to pcre_exec(). Although  not  intended  for  this
-       purpose,  this facility could be used to match a pattern in a different
+       It is possible to pass a table pointer or NULL (indicating the  use  of
+       the  internal  tables)  to  pcre_exec(). Although not intended for this
+       purpose, this facility could be used to match a pattern in a  different
        locale from the one in which it was compiled. Passing table pointers at
        run time is discussed below in the section on matching a pattern.

@@ -1571,15 +1582,15 @@
        int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
             int what, void *where);

-       The  pcre_fullinfo() function returns information about a compiled pat-
+       The pcre_fullinfo() function returns information about a compiled  pat-
        tern. It replaces the obsolete pcre_info() function, which is neverthe-
        less retained for backwards compability (and is documented below).

-       The  first  argument  for  pcre_fullinfo() is a pointer to the compiled
-       pattern. The second argument is the result of pcre_study(), or NULL  if
-       the  pattern  was not studied. The third argument specifies which piece
-       of information is required, and the fourth argument is a pointer  to  a
-       variable  to  receive  the  data. The yield of the function is zero for
+       The first argument for pcre_fullinfo() is a  pointer  to  the  compiled
+       pattern.  The second argument is the result of pcre_study(), or NULL if
+       the pattern was not studied. The third argument specifies  which  piece
+       of  information  is required, and the fourth argument is a pointer to a
+       variable to receive the data. The yield of the  function  is  zero  for
        success, or one of the following negative numbers:

          PCRE_ERROR_NULL       the argument code was NULL
@@ -1587,9 +1598,9 @@
          PCRE_ERROR_BADMAGIC   the "magic number" was not found
          PCRE_ERROR_BADOPTION  the value of what was invalid

-       The "magic number" is placed at the start of each compiled  pattern  as
-       an  simple check against passing an arbitrary memory pointer. Here is a
-       typical call of pcre_fullinfo(), to obtain the length of  the  compiled
+       The  "magic  number" is placed at the start of each compiled pattern as
+       an simple check against passing an arbitrary memory pointer. Here is  a
+       typical  call  of pcre_fullinfo(), to obtain the length of the compiled
        pattern:

          int rc;
@@ -1600,76 +1611,76 @@
            PCRE_INFO_SIZE,   /* what is required */
            &length);         /* where to put the data */

-       The  possible  values for the third argument are defined in pcre.h, and
+       The possible values for the third argument are defined in  pcre.h,  and
        are as follows:

          PCRE_INFO_BACKREFMAX

-       Return the number of the highest back reference  in  the  pattern.  The
-       fourth  argument  should  point to an int variable. Zero is returned if
+       Return  the  number  of  the highest back reference in the pattern. The
+       fourth argument should point to an int variable. Zero  is  returned  if
        there are no back references.

          PCRE_INFO_CAPTURECOUNT

-       Return the number of capturing subpatterns in the pattern.  The  fourth
+       Return  the  number of capturing subpatterns in the pattern. The fourth
        argument should point to an int variable.

          PCRE_INFO_DEFAULT_TABLES

-       Return  a pointer to the internal default character tables within PCRE.
-       The fourth argument should point to an unsigned char *  variable.  This
+       Return a pointer to the internal default character tables within  PCRE.
+       The  fourth  argument should point to an unsigned char * variable. This
        information call is provided for internal use by the pcre_study() func-
-       tion. External callers can cause PCRE to use  its  internal  tables  by
+       tion.  External  callers  can  cause PCRE to use its internal tables by
        passing a NULL table pointer.

          PCRE_INFO_FIRSTBYTE

-       Return  information  about  the first byte of any matched string, for a
-       non-anchored pattern. The fourth argument should point to an int  vari-
-       able.  (This option used to be called PCRE_INFO_FIRSTCHAR; the old name
+       Return information about the first byte of any matched  string,  for  a
+       non-anchored  pattern. The fourth argument should point to an int vari-
+       able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old  name
        is still recognized for backwards compatibility.)

-       If there is a fixed first byte, for example, from  a  pattern  such  as
+       If  there  is  a  fixed first byte, for example, from a pattern such as
        (cat|cow|coyote), its value is returned. Otherwise, if either

-       (a)  the pattern was compiled with the PCRE_MULTILINE option, and every
+       (a) the pattern was compiled with the PCRE_MULTILINE option, and  every
        branch starts with "^", or

        (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
        set (if it were set, the pattern would be anchored),

-       -1  is  returned, indicating that the pattern matches only at the start
-       of a subject string or after any newline within the  string.  Otherwise
+       -1 is returned, indicating that the pattern matches only at  the  start
+       of  a  subject string or after any newline within the string. Otherwise
        -2 is returned. For anchored patterns, -2 is returned.

          PCRE_INFO_FIRSTTABLE

-       If  the pattern was studied, and this resulted in the construction of a
+       If the pattern was studied, and this resulted in the construction of  a
        256-bit table indicating a fixed set of bytes for the first byte in any
-       matching  string, a pointer to the table is returned. Otherwise NULL is
-       returned. The fourth argument should point to an unsigned char *  vari-
+       matching string, a pointer to the table is returned. Otherwise NULL  is
+       returned.  The fourth argument should point to an unsigned char * vari-
        able.

          PCRE_INFO_HASCRORLF

-       Return  1  if  the  pattern  contains any explicit matches for CR or LF
-       characters, otherwise 0. The fourth argument should  point  to  an  int
-       variable.  An explicit match is either a literal CR or LF character, or
+       Return 1 if the pattern contains any explicit  matches  for  CR  or  LF
+       characters,  otherwise  0.  The  fourth argument should point to an int
+       variable. An explicit match is either a literal CR or LF character,  or
        \r or \n.

          PCRE_INFO_JCHANGED

-       Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,
-       otherwise  0. The fourth argument should point to an int variable. (?J)
+       Return  1  if  the (?J) or (?-J) option setting is used in the pattern,
+       otherwise 0. The fourth argument should point to an int variable.  (?J)
        and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.

          PCRE_INFO_LASTLITERAL

-       Return the value of the rightmost literal byte that must exist  in  any
-       matched  string,  other  than  at  its  start,  if such a byte has been
+       Return  the  value of the rightmost literal byte that must exist in any
+       matched string, other than at its  start,  if  such  a  byte  has  been
        recorded. The fourth argument should point to an int variable. If there
-       is  no such byte, -1 is returned. For anchored patterns, a last literal
-       byte is recorded only if it follows something of variable  length.  For
+       is no such byte, -1 is returned. For anchored patterns, a last  literal
+       byte  is  recorded only if it follows something of variable length. For
        example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
        /^a\dz\d/ the returned value is -1.

@@ -1677,34 +1688,34 @@
          PCRE_INFO_NAMEENTRYSIZE
          PCRE_INFO_NAMETABLE

-       PCRE supports the use of named as well as numbered capturing  parenthe-
-       ses.  The names are just an additional way of identifying the parenthe-
+       PCRE  supports the use of named as well as numbered capturing parenthe-
+       ses. The names are just an additional way of identifying the  parenthe-
        ses, which still acquire numbers. Several convenience functions such as
-       pcre_get_named_substring()  are  provided  for extracting captured sub-
-       strings by name. It is also possible to extract the data  directly,  by
-       first  converting  the  name to a number in order to access the correct
+       pcre_get_named_substring() are provided for  extracting  captured  sub-
+       strings  by  name. It is also possible to extract the data directly, by
+       first converting the name to a number in order to  access  the  correct
        pointers in the output vector (described with pcre_exec() below). To do
-       the  conversion,  you  need  to  use  the  name-to-number map, which is
+       the conversion, you need  to  use  the  name-to-number  map,  which  is
        described by these three values.

        The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
        gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
-       of each entry; both of these  return  an  int  value.  The  entry  size
-       depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns
-       a pointer to the first entry of the table  (a  pointer  to  char).  The
+       of  each  entry;  both  of  these  return  an int value. The entry size
+       depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
+       a  pointer  to  the  first  entry of the table (a pointer to char). The
        first two bytes of each entry are the number of the capturing parenthe-
-       sis, most significant byte first. The rest of the entry is  the  corre-
-       sponding  name,  zero  terminated. The names are in alphabetical order.
+       sis,  most  significant byte first. The rest of the entry is the corre-
+       sponding name, zero terminated. The names are  in  alphabetical  order.
        When PCRE_DUPNAMES is set, duplicate names are in order of their paren-
-       theses  numbers.  For  example,  consider the following pattern (assume
-       PCRE_EXTENDED is  set,  so  white  space  -  including  newlines  -  is
+       theses numbers. For example, consider  the  following  pattern  (assume
+       PCRE_EXTENDED  is  set,  so  white  space  -  including  newlines  - is
        ignored):

          (?<date> (?<year>(\d\d)?\d\d) -
          (?<month>\d\d) - (?<day>\d\d) )

-       There  are  four  named subpatterns, so the table has four entries, and
-       each entry in the table is eight bytes long. The table is  as  follows,
+       There are four named subpatterns, so the table has  four  entries,  and
+       each  entry  in the table is eight bytes long. The table is as follows,
        with non-printing bytes shows in hexadecimal, and undefined bytes shown
        as ??:

@@ -1713,17 +1724,18 @@
          00 04 m  o  n  t  h  00
          00 02 y  e  a  r  00 ??

-       When writing code to extract data  from  named  subpatterns  using  the
-       name-to-number  map,  remember that the length of the entries is likely
+       When  writing  code  to  extract  data from named subpatterns using the
+       name-to-number map, remember that the length of the entries  is  likely
        to be different for each compiled pattern.

          PCRE_INFO_OKPARTIAL

-       Return 1 if the pattern can be used for partial matching, otherwise  0.
-       The fourth argument should point to an int variable. From release 8.00,
-       this always returns 1, because the restrictions that previously applied
-       to  partial  matching  have  been lifted. The pcrepartial documentation
-       gives details of partial matching.
+       Return  1  if  the  pattern  can  be  used  for  partial  matching with
+       pcre_exec(), otherwise 0. The fourth argument should point  to  an  int
+       variable.  From  release  8.00,  this  always  returns  1,  because the
+       restrictions that previously applied  to  partial  matching  have  been
+       lifted.  The  pcrepartial documentation gives details of partial match-
+       ing.

          PCRE_INFO_OPTIONS

@@ -1909,8 +1921,8 @@
        PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in  the  flags  field.  If  the
        limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.

-       The  pcre_callout  field is used in conjunction with the "callout" fea-
-       ture, which is described in the pcrecallout documentation.
+       The  callout_data  field is used in conjunction with the "callout" fea-
+       ture, and is described in the pcrecallout documentation.

        The tables field  is  used  to  pass  a  character  tables  pointer  to
        pcre_exec();  this overrides the value that is stored with the compiled
@@ -1927,22 +1939,23 @@

        The  unused  bits of the options argument for pcre_exec() must be zero.
        The only bits that may  be  set  are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,
-       PCRE_NOTBOL,    PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_START_OPTIMIZE,
-       PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and PCRE_PARTIAL_HARD.
+       PCRE_NOTBOL,    PCRE_NOTEOL,    PCRE_NOTEMPTY,   PCRE_NOTEMPTY_ATSTART,
+       PCRE_NO_START_OPTIMIZE,  PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_SOFT,   and
+       PCRE_PARTIAL_HARD.

          PCRE_ANCHORED

-       The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first
-       matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or
-       turned out to be anchored by virtue of its contents, it cannot be  made
+       The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first
+       matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or
+       turned  out to be anchored by virtue of its contents, it cannot be made
        unachored at matching time.

          PCRE_BSR_ANYCRLF
          PCRE_BSR_UNICODE

        These options (which are mutually exclusive) control what the \R escape
-       sequence matches. The choice is either to match only CR, LF,  or  CRLF,
-       or  to  match  any Unicode newline sequence. These options override the
+       sequence  matches.  The choice is either to match only CR, LF, or CRLF,
+       or to match any Unicode newline sequence. These  options  override  the
        choice that was made or defaulted when the pattern was compiled.

          PCRE_NEWLINE_CR
@@ -1951,77 +1964,84 @@
          PCRE_NEWLINE_ANYCRLF
          PCRE_NEWLINE_ANY

-       These options override  the  newline  definition  that  was  chosen  or
-       defaulted  when the pattern was compiled. For details, see the descrip-
-       tion of pcre_compile()  above.  During  matching,  the  newline  choice
-       affects  the  behaviour  of the dot, circumflex, and dollar metacharac-
-       ters. It may also alter the way the match position is advanced after  a
+       These  options  override  the  newline  definition  that  was chosen or
+       defaulted when the pattern was compiled. For details, see the  descrip-
+       tion  of  pcre_compile()  above.  During  matching,  the newline choice
+       affects the behaviour of the dot, circumflex,  and  dollar  metacharac-
+       ters.  It may also alter the way the match position is advanced after a
        match failure for an unanchored pattern.

-       When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or PCRE_NEWLINE_ANY is
-       set, and a match attempt for an unanchored pattern fails when the  cur-
-       rent  position  is  at  a  CRLF  sequence,  and the pattern contains no
-       explicit matches for  CR  or  LF  characters,  the  match  position  is
+       When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY  is
+       set,  and a match attempt for an unanchored pattern fails when the cur-
+       rent position is at a  CRLF  sequence,  and  the  pattern  contains  no
+       explicit  matches  for  CR  or  LF  characters,  the  match position is
        advanced by two characters instead of one, in other words, to after the
        CRLF.

        The above rule is a compromise that makes the most common cases work as
-       expected.  For  example,  if  the  pattern  is .+A (and the PCRE_DOTALL
+       expected. For example, if the  pattern  is  .+A  (and  the  PCRE_DOTALL
        option is not set), it does not match the string "\r\nA" because, after
-       failing  at the start, it skips both the CR and the LF before retrying.
-       However, the pattern [\r\n]A does match that string,  because  it  con-
+       failing at the start, it skips both the CR and the LF before  retrying.
+       However,  the  pattern  [\r\n]A does match that string, because it con-
        tains an explicit CR or LF reference, and so advances only by one char-
        acter after the first failure.

        An explicit match for CR of LF is either a literal appearance of one of
-       those  characters,  or  one  of the \r or \n escape sequences. Implicit
-       matches such as [^X] do not count, nor does \s (which includes  CR  and
+       those characters, or one of the \r or  \n  escape  sequences.  Implicit
+       matches  such  as [^X] do not count, nor does \s (which includes CR and
        LF in the characters that it matches).

-       Notwithstanding  the above, anomalous effects may still occur when CRLF
+       Notwithstanding the above, anomalous effects may still occur when  CRLF
        is a valid newline sequence and explicit \r or \n escapes appear in the
        pattern.

          PCRE_NOTBOL

        This option specifies that first character of the subject string is not
-       the beginning of a line, so the  circumflex  metacharacter  should  not
-       match  before it. Setting this without PCRE_MULTILINE (at compile time)
-       causes circumflex never to match. This option affects only  the  behav-
+       the  beginning  of  a  line, so the circumflex metacharacter should not
+       match before it. Setting this without PCRE_MULTILINE (at compile  time)
+       causes  circumflex  never to match. This option affects only the behav-
        iour of the circumflex metacharacter. It does not affect \A.

          PCRE_NOTEOL

        This option specifies that the end of the subject string is not the end
-       of a line, so the dollar metacharacter should not match it nor  (except
-       in  multiline mode) a newline immediately before it. Setting this with-
+       of  a line, so the dollar metacharacter should not match it nor (except
+       in multiline mode) a newline immediately before it. Setting this  with-
        out PCRE_MULTILINE (at compile time) causes dollar never to match. This
-       option  affects only the behaviour of the dollar metacharacter. It does
+       option affects only the behaviour of the dollar metacharacter. It  does
        not affect \Z or \z.

          PCRE_NOTEMPTY

        An empty string is not considered to be a valid match if this option is
-       set.  If  there are alternatives in the pattern, they are tried. If all
-       the alternatives match the empty string, the entire  match  fails.  For
+       set. If there are alternatives in the pattern, they are tried.  If  all
+       the  alternatives  match  the empty string, the entire match fails. For
        example, if the pattern

          a?b?

-       is  applied  to  a string not beginning with "a" or "b", it matches the
-       empty string at the start of the subject. With PCRE_NOTEMPTY set,  this
+       is applied to a string not beginning with "a" or  "b",  it  matches  an
+       empty  string at the start of the subject. With PCRE_NOTEMPTY set, this
        match is not valid, so PCRE searches further into the string for occur-
        rences of "a" or "b".

-       Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-
-       cial  case  of  a  pattern match of the empty string within its split()
-       function, and when using the /g modifier. It  is  possible  to  emulate
-       Perl's behaviour after matching a null string by first trying the match
-       again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
-       if  that  fails by advancing the starting offset (see below) and trying
-       an ordinary match again. There is some code that demonstrates how to do
-       this in the pcredemo sample program.
+         PCRE_NOTEMPTY_ATSTART

+       This  is  like PCRE_NOTEMPTY, except that an empty string match that is
+       not at the start of  the  subject  is  permitted.  If  the  pattern  is
+       anchored, such a match can occur only if the pattern contains \K.
+
+       Perl     has    no    direct    equivalent    of    PCRE_NOTEMPTY    or
+       PCRE_NOTEMPTY_ATSTART, but it does make a special  case  of  a  pattern
+       match  of  the empty string within its split() function, and when using
+       the /g modifier. It is  possible  to  emulate  Perl's  behaviour  after
+       matching a null string by first trying the match again at the same off-
+       set with PCRE_NOTEMPTY_ATSTART and  PCRE_ANCHORED,  and  then  if  that
+       fails, by advancing the starting offset (see below) and trying an ordi-
+       nary match again. There is some code that demonstrates how to  do  this
+       in the pcredemo sample program.
+
          PCRE_NO_START_OPTIMIZE

        There  are a number of optimizations that pcre_exec() uses at the start
@@ -2066,9 +2086,9 @@
        returns  PCRE_ERROR_PARTIAL.  Otherwise,  if  PCRE_PARTIAL_SOFT is set,
        matching continues by testing any other alternatives. Only if they  all
        fail  is  PCRE_ERROR_PARTIAL  returned (instead of PCRE_ERROR_NOMATCH).
-       The portion of the string that provided the partial match is set as the
-       first  matching  string.  There  is  a  more detailed discussion in the
-       pcrepartial documentation.
+       The portion of the string that was inspected when the partial match was
+       found  is  set  as  the first matching string. There is a more detailed
+       discussion in the pcrepartial documentation.

    The string to be matched by pcre_exec()

@@ -2484,19 +2504,20 @@
        characteristics to the normal algorithm, and  is  not  compatible  with
        Perl.  Some  of the features of PCRE patterns are not supported. Never-
        theless, there are times when this kind of matching can be useful.  For
-       a discussion of the two matching algorithms, see the pcrematching docu-
-       mentation.
+       a  discussion  of  the  two matching algorithms, and a list of features
+       that pcre_dfa_exec() does not support, see the pcrematching  documenta-
+       tion.

-       The arguments for the pcre_dfa_exec() function  are  the  same  as  for
+       The  arguments  for  the  pcre_dfa_exec()  function are the same as for
        pcre_exec(), plus two extras. The ovector argument is used in a differ-
-       ent way, and this is described below. The other  common  arguments  are
-       used  in  the  same way as for pcre_exec(), so their description is not
+       ent  way,  and  this is described below. The other common arguments are
+       used in the same way as for pcre_exec(), so their  description  is  not
        repeated here.

-       The two additional arguments provide workspace for  the  function.  The
-       workspace  vector  should  contain at least 20 elements. It is used for
+       The  two  additional  arguments provide workspace for the function. The
+       workspace vector should contain at least 20 elements. It  is  used  for
        keeping  track  of  multiple  paths  through  the  pattern  tree.  More
-       workspace  will  be  needed for patterns and subjects where there are a
+       workspace will be needed for patterns and subjects where  there  are  a
        lot of potential matches.

        Here is an example of a simple call to pcre_dfa_exec():
@@ -2518,12 +2539,13 @@

    Option bits for pcre_dfa_exec()

-       The unused bits of the options argument  for  pcre_dfa_exec()  must  be
-       zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
-       LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,  PCRE_NO_UTF8_CHECK,
-       PCRE_PARTIAL_HARD,     PCRE_PARTIAL_SOFT,     PCRE_DFA_SHORTEST,    and
-       PCRE_DFA_RESTART. All but the last four of these are exactly  the  same
-       as for pcre_exec(), so their description is not repeated here.
+       The  unused  bits  of  the options argument for pcre_dfa_exec() must be
+       zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-
+       LINE_xxx,        PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY,
+       PCRE_NOTEMPTY_ATSTART, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, PCRE_PAR-
+       TIAL_SOFT,  PCRE_DFA_SHORTEST,  and  PCRE_DFA_RESTART. All but the last
+       four of these are  exactly  the  same  as  for  pcre_exec(),  so  their
+       description is not repeated here.

          PCRE_PARTIAL_HARD
          PCRE_PARTIAL_SOFT
@@ -2537,8 +2559,8 @@
        code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end
        of the subject is reached, there have been  no  complete  matches,  but
        there  is  still  at least one matching possibility. The portion of the
-       string that provided the longest partial match  is  set  as  the  first
-       matching string in both cases.
+       string that was inspected when the longest partial match was  found  is
+       set as the first matching string in both cases.

          PCRE_DFA_SHORTEST

@@ -2644,7 +2666,7 @@

REVISION

-       Last updated: 01 September 2009
+       Last updated: 11 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -2869,7 +2891,11 @@
        is  built  with Unicode character property support. The properties that
        can be tested with \p and \P are limited to the general category  prop-
        erties  such  as  Lu and Nd, script names such as Greek or Han, and the
-       derived properties Any and L&.
+       derived properties Any and L&. PCRE does  support  the  Cs  (surrogate)
+       property,  which  Perl  does  not; the Perl documentation says "Because
+       Perl hides the need for the user to understand the internal representa-
+       tion  of Unicode characters, there is no need to implement the somewhat
+       messy concept of surrogates."

        7. PCRE does support the \Q...\E escape for quoting substrings. Charac-
        ters  in  between  are  treated as literals. This is slightly different
@@ -2889,13 +2915,15 @@

        8. Fairly obviously, PCRE does not support the (?{code}) and (??{code})
        constructions. However, there is support for recursive  patterns.  This
-       is  not available in Perl 5.8, but will be in Perl 5.10. Also, the PCRE
+       is  not  available  in Perl 5.8, but it is in Perl 5.10. Also, the PCRE
        "callout" feature allows an external function to be called during  pat-
        tern matching. See the pcrecallout documentation for details.

        9.  Subpatterns  that  are  called  recursively or as "subroutines" are
        always treated as atomic groups in  PCRE.  This  is  like  Python,  but
-       unlike Perl.
+       unlike  Perl. There is a discussion of an example that explains this in
+       more detail in the section on recursion differences from  Perl  in  the
+       pcrecompat page.

        10.  There are some differences that are concerned with the settings of
        captured strings when part of  a  pattern  is  repeated.  For  example,
@@ -2904,9 +2932,7 @@

        11.  PCRE  does  support  Perl  5.10's  backtracking  verbs  (*ACCEPT),
        (*FAIL),  (*F),  (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in
-       the forms without an  argument.  PCRE  does  not  support  (*MARK).  If
-       (*ACCEPT)  is within capturing parentheses, PCRE does not set that cap-
-       ture group; this is different to Perl.
+       the forms without an argument. PCRE does not support (*MARK).

        12. PCRE provides some extensions to the Perl regular expression facil-
        ities.   Perl  5.10  will  include new features that are not in earlier
@@ -2931,10 +2957,11 @@
        (e) PCRE_ANCHORED can be used at matching time to force a pattern to be
        tried only at the first matching position in the subject string.

-       (f)  The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAP-
-       TURE options for pcre_exec() have no Perl equivalents.
+       (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
+       and PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no  Perl  equiva-
+       lents.

-       (g) The \R escape sequence can be restricted to match only CR,  LF,  or
+       (g)  The  \R escape sequence can be restricted to match only CR, LF, or
        CRLF by the PCRE_BSR_ANYCRLF option.

        (h) The callout facility is PCRE-specific.
@@ -2944,10 +2971,10 @@
        (j) Patterns compiled by PCRE can be saved and re-used at a later time,
        even on different hosts that have the other endianness.

-       (k) The alternative matching function (pcre_dfa_exec())  matches  in  a
+       (k)  The  alternative  matching function (pcre_dfa_exec()) matches in a
        different way and is not Perl-compatible.

-       (l)  PCRE  recognizes some special sequences such as (*CR) at the start
+       (l) PCRE recognizes some special sequences such as (*CR) at  the  start
        of a pattern that set overall options that cannot be changed within the
        pattern.

@@ -2961,7 +2988,7 @@

REVISION

-       Last updated: 25 August 2009
+       Last updated: 18 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -3480,9 +3507,9 @@
        U+D800 to U+DFFF. Such characters are not valid in UTF-8  strings  (see
        RFC 3629) and so cannot be tested by PCRE, unless UTF-8 validity check-
        ing has been turned off (see the discussion  of  PCRE_NO_UTF8_CHECK  in
-       the pcreapi page).
+       the pcreapi page). Perl does not support the Cs property.

-       The  long  synonyms  for  these  properties that Perl supports (such as
+       The  long  synonyms  for  property  names  that  Perl supports (such as
        \p{Letter}) are not supported by PCRE, nor is it  permitted  to  prefix
        any of these properties with "Is".

@@ -4707,8 +4734,8 @@
        Obviously, PCRE cannot support the interpolation of Perl code. Instead,
        it  supports  special  syntax  for recursion of the entire pattern, and
        also for individual subpattern recursion.  After  its  introduction  in
-       PCRE  and  Python,  this  kind of recursion was introduced into Perl at
-       release 5.10.
+       PCRE  and  Python,  this  kind of recursion was subsequently introduced
+       into Perl at release 5.10.

        A special item that consists of (? followed by a  number  greater  than
        zero and a closing parenthesis is a recursive call of the subpattern of
@@ -4717,105 +4744,166 @@
        tion.) The special item (?R) or (?0) is a recursive call of the  entire
        regular expression.

-       In  PCRE (like Python, but unlike Perl), a recursive subpattern call is
-       always treated as an atomic group. That is, once it has matched some of
-       the subject string, it is never re-entered, even if it contains untried
-       alternatives and there is a subsequent matching failure.
-
-       This PCRE pattern solves the nested  parentheses  problem  (assume  the
+       This  PCRE  pattern  solves  the nested parentheses problem (assume the
        PCRE_EXTENDED option is set so that white space is ignored):

          \( ( (?>[^()]+) | (?R) )* \)

-       First  it matches an opening parenthesis. Then it matches any number of
-       substrings which can either be a  sequence  of  non-parentheses,  or  a
-       recursive  match  of the pattern itself (that is, a correctly parenthe-
+       First it matches an opening parenthesis. Then it matches any number  of
+       substrings  which  can  either  be  a sequence of non-parentheses, or a
+       recursive match of the pattern itself (that is, a  correctly  parenthe-
        sized substring).  Finally there is a closing parenthesis.

-       If this were part of a larger pattern, you would not  want  to  recurse
+       If  this  were  part of a larger pattern, you would not want to recurse
        the entire pattern, so instead you could use this:

          ( \( ( (?>[^()]+) | (?1) )* \) )

-       We  have  put the pattern into parentheses, and caused the recursion to
+       We have put the pattern into parentheses, and caused the  recursion  to
        refer to them instead of the whole pattern.

-       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
-       tricky.  This is made easier by the use of relative references. (A Perl
-       5.10 feature.)  Instead of (?1) in the  pattern  above  you  can  write
+       In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
+       tricky. This is made easier by the use of relative references. (A  Perl
+       5.10  feature.)   Instead  of  (?1)  in the pattern above you can write
        (?-2) to refer to the second most recently opened parentheses preceding
-       the recursion. In other  words,  a  negative  number  counts  capturing
+       the  recursion.  In  other  words,  a  negative number counts capturing
        parentheses leftwards from the point at which it is encountered.

-       It  is  also  possible  to refer to subsequently opened parentheses, by
-       writing references such as (?+2). However, these  cannot  be  recursive
-       because  the  reference  is  not inside the parentheses that are refer-
-       enced. They are always "subroutine" calls, as  described  in  the  next
+       It is also possible to refer to  subsequently  opened  parentheses,  by
+       writing  references  such  as (?+2). However, these cannot be recursive
+       because the reference is not inside the  parentheses  that  are  refer-
+       enced.  They  are  always  "subroutine" calls, as described in the next
        section.

-       An  alternative  approach is to use named parentheses instead. The Perl
-       syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
+       An alternative approach is to use named parentheses instead.  The  Perl
+       syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also
        supported. We could rewrite the above example as follows:

          (?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )

-       If  there  is more than one subpattern with the same name, the earliest
+       If there is more than one subpattern with the same name,  the  earliest
        one is used.

-       This particular example pattern that we have been looking  at  contains
-       nested  unlimited repeats, and so the use of atomic grouping for match-
-       ing strings of non-parentheses is important when applying  the  pattern
+       This  particular  example pattern that we have been looking at contains
+       nested unlimited repeats, and so the use of atomic grouping for  match-
+       ing  strings  of non-parentheses is important when applying the pattern
        to strings that do not match. For example, when this pattern is applied
        to

          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()

-       it yields "no match" quickly. However, if atomic grouping is not  used,
-       the  match  runs  for a very long time indeed because there are so many
-       different ways the + and * repeats can carve up the  subject,  and  all
+       it  yields "no match" quickly. However, if atomic grouping is not used,
+       the match runs for a very long time indeed because there  are  so  many
+       different  ways  the  + and * repeats can carve up the subject, and all
        have to be tested before failure can be reported.

        At the end of a match, the values set for any capturing subpatterns are
        those from the outermost level of the recursion at which the subpattern
-       value  is  set.   If  you want to obtain intermediate values, a callout
-       function can be used (see below and the pcrecallout documentation).  If
+       value is set.  If you want to obtain  intermediate  values,  a  callout
+       function  can be used (see below and the pcrecallout documentation). If
        the pattern above is matched against

          (ab(cd)ef)

-       the  value  for  the  capturing  parentheses is "ef", which is the last
-       value taken on at the top level. If additional parentheses  are  added,
+       the value for the capturing parentheses is  "ef",  which  is  the  last
+       value  taken  on at the top level. If additional parentheses are added,
        giving

          \( ( ( (?>[^()]+) | (?R) )* ) \)
             ^                        ^
             ^                        ^

-       the  string  they  capture is "ab(cd)ef", the contents of the top level
-       parentheses. If there are more than 15 capturing parentheses in a  pat-
+       the string they capture is "ab(cd)ef", the contents of  the  top  level
+       parentheses.  If there are more than 15 capturing parentheses in a pat-
        tern, PCRE has to obtain extra memory to store data during a recursion,
-       which it does by using pcre_malloc, freeing  it  via  pcre_free  after-
-       wards.  If  no  memory  can  be  obtained,  the  match  fails  with the
+       which  it  does  by  using pcre_malloc, freeing it via pcre_free after-
+       wards. If  no  memory  can  be  obtained,  the  match  fails  with  the
        PCRE_ERROR_NOMEMORY error.

-       Do not confuse the (?R) item with the condition (R),  which  tests  for
-       recursion.   Consider  this pattern, which matches text in angle brack-
-       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
-       brackets  (that is, when recursing), whereas any characters are permit-
+       Do  not  confuse  the (?R) item with the condition (R), which tests for
+       recursion.  Consider this pattern, which matches text in  angle  brack-
+       ets,  allowing for arbitrary nesting. Only digits are allowed in nested
+       brackets (that is, when recursing), whereas any characters are  permit-
        ted at the outer level.

          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >

-       In this pattern, (?(R) is the start of a conditional  subpattern,  with
-       two  different  alternatives for the recursive and non-recursive cases.
+       In  this  pattern, (?(R) is the start of a conditional subpattern, with
+       two different alternatives for the recursive and  non-recursive  cases.
        The (?R) item is the actual recursive call.

+ Recursion difference from Perl

+       In  PCRE (like Python, but unlike Perl), a recursive subpattern call is
+       always treated as an atomic group. That is, once it has matched some of
+       the subject string, it is never re-entered, even if it contains untried
+       alternatives and there is a subsequent matching failure.  This  can  be
+       illustrated  by the following pattern, which purports to match a palin-
+       dromic string that contains an odd number of characters  (for  example,
+       "a", "aba", "abcba", "abcdcba"):
+
+         ^(.|(.)(?1)\2)$
+
+       The idea is that it either matches a single character, or two identical
+       characters surrounding a sub-palindrome. In Perl, this  pattern  works;
+       in  PCRE  it  does  not if the pattern is longer than three characters.
+       Consider the subject string "abcba":
+
+       At the top level, the first character is matched, but as it is  not  at
+       the end of the string, the first alternative fails; the second alterna-
+       tive is taken and the recursion kicks in. The recursive call to subpat-
+       tern  1  successfully  matches the next character ("b"). (Note that the
+       beginning and end of line tests are not part of the recursion).
+
+       Back at the top level, the next character ("c") is compared  with  what
+       subpattern  2 matched, which was "a". This fails. Because the recursion
+       is treated as an atomic group, there are now  no  backtracking  points,
+       and  so  the  entire  match fails. (Perl is able, at this point, to re-
+       enter the recursion and try the second alternative.)  However,  if  the
+       pattern is written with the alternatives in the other order, things are
+       different:
+
+         ^((.)(?1)\2|.)$
+
+       This time, the recursing alternative is tried first, and  continues  to
+       recurse  until  it runs out of characters, at which point the recursion
+       fails. But this time we do have  another  alternative  to  try  at  the
+       higher  level.  That  is  the  big difference: in the previous case the
+       remaining alternative is at a deeper recursion level, which PCRE cannot
+       use.
+
+       To change the pattern so that matches all palindromic strings, not just
+       those with an odd number of characters, it is tempting  to  change  the
+       pattern to this:
+
+         ^((.)(?1)\2|.?)$
+
+       Again,  this  works  in Perl, but not in PCRE, and for the same reason.
+       When a deeper recursion has matched a single character,  it  cannot  be
+       entered  again  in  order  to match an empty string. The solution is to
+       separate the two cases, and write out the odd and even cases as  alter-
+       natives at the higher level:
+
+         ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
+
+       If  you  want  to match typical palindromic phrases, the pattern has to
+       ignore all non-word characters, which can be done like this:
+
+         ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+4|\W*+.\W*+))\W*+$
+
+       If run with the PCRE_CASELESS option, this pattern matches phrases such
+       as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
+       Perl. Note the use of the possessive quantifier *+ to avoid  backtrack-
+       ing  into  sequences of non-word characters. Without this, PCRE takes a
+       great deal longer (ten times or more) to  match  typical  phrases,  and
+       Perl takes so long that you think it has gone into a loop.
+
+
 SUBPATTERNS AS SUBROUTINES

        If the syntax for a recursive subpattern reference (either by number or
-       by  name)  is used outside the parentheses to which it refers, it oper-
-       ates like a subroutine in a programming language. The "called"  subpat-
+       by name) is used outside the parentheses to which it refers,  it  oper-
+       ates  like a subroutine in a programming language. The "called" subpat-
        tern may be defined before or after the reference. A numbered reference
        can be absolute or relative, as in these examples:

@@ -4827,101 +4915,106 @@

          (sens|respons)e and \1ibility

-       matches "sense and sensibility" and "response and responsibility",  but
+       matches  "sense and sensibility" and "response and responsibility", but
        not "sense and responsibility". If instead the pattern

          (sens|respons)e and (?1)ibility

-       is  used, it does match "sense and responsibility" as well as the other
-       two strings. Another example is  given  in  the  discussion  of  DEFINE
+       is used, it does match "sense and responsibility" as well as the  other
+       two  strings.  Another  example  is  given  in the discussion of DEFINE
        above.

        Like recursive subpatterns, a "subroutine" call is always treated as an
-       atomic group. That is, once it has matched some of the subject  string,
-       it  is  never  re-entered, even if it contains untried alternatives and
+       atomic  group. That is, once it has matched some of the subject string,
+       it is never re-entered, even if it contains  untried  alternatives  and
        there is a subsequent matching failure.

-       When a subpattern is used as a subroutine, processing options  such  as
+       When  a  subpattern is used as a subroutine, processing options such as
        case-independence are fixed when the subpattern is defined. They cannot
        be changed for different calls. For example, consider this pattern:

          (abc)(?i:(?-1))

-       It matches "abcabc". It does not match "abcABC" because the  change  of
+       It  matches  "abcabc". It does not match "abcABC" because the change of
        processing option does not affect the called subpattern.

ONIGURUMA SUBROUTINE SYNTAX

-       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
+       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
        name or a number enclosed either in angle brackets or single quotes, is
-       an  alternative  syntax  for  referencing a subpattern as a subroutine,
-       possibly recursively. Here are two of the examples used above,  rewrit-
+       an alternative syntax for referencing a  subpattern  as  a  subroutine,
+       possibly  recursively. Here are two of the examples used above, rewrit-
        ten using this syntax:

          (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
          (sens|respons)e and \g'1'ibility

-       PCRE  supports  an extension to Oniguruma: if a number is preceded by a
+       PCRE supports an extension to Oniguruma: if a number is preceded  by  a
        plus or a minus sign it is taken as a relative reference. For example:

          (abc)(?i:\g<-1>)

-       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
-       synonymous.  The former is a back reference; the latter is a subroutine
+       Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
+       synonymous. The former is a back reference; the latter is a  subroutine
        call.

CALLOUTS

        Perl has a feature whereby using the sequence (?{...}) causes arbitrary
-       Perl  code to be obeyed in the middle of matching a regular expression.
+       Perl code to be obeyed in the middle of matching a regular  expression.
        This makes it possible, amongst other things, to extract different sub-
        strings that match the same pair of parentheses when there is a repeti-
        tion.

        PCRE provides a similar feature, but of course it cannot obey arbitrary
        Perl code. The feature is called "callout". The caller of PCRE provides
-       an external function by putting its entry point in the global  variable
-       pcre_callout.   By default, this variable contains NULL, which disables
+       an  external function by putting its entry point in the global variable
+       pcre_callout.  By default, this variable contains NULL, which  disables
        all calling out.

-       Within a regular expression, (?C) indicates the  points  at  which  the
-       external  function  is  to be called. If you want to identify different
-       callout points, you can put a number less than 256 after the letter  C.
-       The  default  value is zero.  For example, this pattern has two callout
+       Within  a  regular  expression,  (?C) indicates the points at which the
+       external function is to be called. If you want  to  identify  different
+       callout  points, you can put a number less than 256 after the letter C.
+       The default value is zero.  For example, this pattern has  two  callout
        points:

          (?C1)abc(?C2)def

        If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
-       automatically  installed  before each item in the pattern. They are all
+       automatically installed before each item in the pattern. They  are  all
        numbered 255.

        During matching, when PCRE reaches a callout point (and pcre_callout is
-       set),  the  external function is called. It is provided with the number
-       of the callout, the position in the pattern, and, optionally, one  item
-       of  data  originally supplied by the caller of pcre_exec(). The callout
-       function may cause matching to proceed, to backtrack, or to fail  alto-
+       set), the external function is called. It is provided with  the  number
+       of  the callout, the position in the pattern, and, optionally, one item
+       of data originally supplied by the caller of pcre_exec().  The  callout
+       function  may cause matching to proceed, to backtrack, or to fail alto-
        gether. A complete description of the interface to the callout function
        is given in the pcrecallout documentation.

BACKTRACKING CONTROL

-       Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
+       Perl  5.10 introduced a number of "Special Backtracking Control Verbs",
        which are described in the Perl documentation as "experimental and sub-
-       ject to change or removal in a future version of Perl". It goes  on  to
-       say:  "Their usage in production code should be noted to avoid problems
+       ject  to  change or removal in a future version of Perl". It goes on to
+       say: "Their usage in production code should be noted to avoid  problems
        during upgrades." The same remarks apply to the PCRE features described
        in this section.

-       Since  these  verbs  are  specifically related to backtracking, most of
-       them can be  used  only  when  the  pattern  is  to  be  matched  using
+       Since these verbs are specifically related  to  backtracking,  most  of
+       them  can  be  used  only  when  the  pattern  is  to  be matched using
        pcre_exec(), which uses a backtracking algorithm. With the exception of
        (*FAIL), which behaves like a failing negative assertion, they cause an
        error if encountered by pcre_dfa_exec().

+       If any of these verbs are used in an assertion subpattern, their effect
+       is  confined  to that subpattern; it does not extend to the surrounding
+       pattern.  Note that assertion subpatterns are processed as anchored  at
+       the point where they are tested.
+
        The  new verbs make use of what was previously invalid syntax: an open-
        ing parenthesis followed by an asterisk. In Perl, they are generally of
        the form (*VERB:ARG) but PCRE does not support the use of arguments, so
@@ -4936,14 +5029,14 @@

        This  verb causes the match to end successfully, skipping the remainder
        of the pattern. When inside a recursion, only the innermost pattern  is
-       ended  immediately.  PCRE  differs  from  Perl  in  what happens if the
-       (*ACCEPT) is inside capturing parentheses. In Perl, the data so far  is
-       captured: in PCRE no data is captured. For example:
+       ended  immediately.  If  the (*ACCEPT) is inside capturing parentheses,
+       the data so far is captured. (This feature was added to PCRE at release
+       8.00.) For example:

-         A(A|B(*ACCEPT)|C)D
+         A((?:A|B(*ACCEPT)|C)D)

-       This  matches  "AB", "AAD", or "ACD", but when it matches "AB", no data
-       is captured.
+       This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
+       tured by the outer parentheses.

          (*FAIL) or (*F)

@@ -5039,7 +5132,7 @@

REVISION

-       Last updated: 11 April 2009
+       Last updated: 18 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -5453,18 +5546,34 @@
        If PCRE_PARTIAL_SOFT is set,  the  partial  match  is  remembered,  but
        matching continues as normal, and other alternatives in the pattern are
        tried.  If  no  complete  match  can  be  found,  pcre_exec()   returns
-       PCRE_ERROR_PARTIAL  instead  of PCRE_ERROR_NOMATCH, and if there are at
-       least two slots in the offsets vector, they are filled in with the off-
-       sets  of  the longest string that partially matched. Consider this pat-
-       tern:
+       PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. If there are at least
+       two slots in the offsets vector, the first of them is set to the offset
+       of the earliest character that was inspected when the partial match was
+       found. For convenience, the second offset points  to  the  end  of  the
+       string so that a substring can easily be extracted.

+       For  the majority of patterns, the first offset identifies the start of
+       the partially matched string. However, for patterns that contain  look-
+       behind  assertions,  or  \K, or begin with \b or \B, earlier characters
+       have been inspected while carrying out the match. For example:
+
+         /(?<=abc)123/
+
+       This pattern matches "123", but only if it is preceded by "abc". If the
+       subject string is "xyzabc12", the offsets after a partial match are for
+       the substring "abc12", because  all  these  characters  are  needed  if
+       another match is tried with extra characters added.
+
+       If  there  is more than one partial match, the first one that was found
+       provides the data that is returned. Consider this pattern:
+
          /123\w+X|dogY/

        If this is matched against the subject string "abc123dog", both  alter-
        natives  fail  to  match,  but the end of the subject is reached during
        matching,   so    PCRE_ERROR_PARTIAL    is    returned    instead    of
        PCRE_ERROR_NOMATCH.  The  offsets  are  set  to  3  and  9, identifying
-       "123dog" as the longest partial match that was found. (In this example,
+       "123dog" as the first partial match that was found. (In  this  example,
        there  are  two  partial  matches,  because  "dog" on its own partially
        matches the second alternative.)

@@ -5508,64 +5617,65 @@
        there  have  been  no complete matches. Otherwise, the complete matches
        are returned.  However, if PCRE_PARTIAL_HARD is set,  a  partial  match
        takes  precedence  over any complete matches. The portion of the string
-       that provided the longest partial match is set as  the  first  matching
-       string, provided there are at least two slots in the offsets vector.
+       that was inspected when the longest partial match was found is  set  as
+       the first matching string, provided there are at least two slots in the
+       offsets vector.

-       Because  pcre_dfa_exec()  always searches for all possible matches, and
-       there is no difference between greedy and ungreedy repetition, its  be-
+       Because pcre_dfa_exec() always searches for all possible  matches,  and
+       there  is no difference between greedy and ungreedy repetition, its be-
        haviour is different from pcre_exec when PCRE_PARTIAL_HARD is set. Con-
-       sider the string "dog"  matched  against  the  ungreedy  pattern  shown
+       sider  the  string  "dog"  matched  against  the ungreedy pattern shown
        above:

          /dog(sbody)??/

-       Whereas  pcre_exec()  stops  as soon as it finds the complete match for
+       Whereas pcre_exec() stops as soon as it finds the  complete  match  for
        "dog", pcre_dfa_exec() also finds the partial match for "dogsbody", and
        so returns that when PCRE_PARTIAL_HARD is set.

PARTIAL MATCHING AND WORD BOUNDARIES

-       If  a  pattern ends with one of sequences \w or \W, which test for word
-       boundaries, partial matching with PCRE_PARTIAL_SOFT can  give  counter-
+       If a pattern ends with one of sequences \w or \W, which test  for  word
+       boundaries,  partial  matching with PCRE_PARTIAL_SOFT can give counter-
        intuitive results. Consider this pattern:

          /\bcat\b/

        This matches "cat", provided there is a word boundary at either end. If
        the subject string is "the cat", the comparison of the final "t" with a
-       following  character  cannot  take  place, so a partial match is found.
-       However, pcre_exec() carries on with normal matching, which matches  \b
-       at  the  end  of  the subject when the last character is a letter, thus
+       following character cannot take place, so a  partial  match  is  found.
+       However,  pcre_exec() carries on with normal matching, which matches \b
+       at the end of the subject when the last character  is  a  letter,  thus
        finding a complete match. The result, therefore, is not PCRE_ERROR_PAR-
-       TIAL.  The  same  thing  happens  with pcre_dfa_exec(), because it also
+       TIAL. The same thing happens  with  pcre_dfa_exec(),  because  it  also
        finds the complete match.

-       Using PCRE_PARTIAL_HARD in this  case  does  yield  PCRE_ERROR_PARTIAL,
+       Using  PCRE_PARTIAL_HARD  in  this  case does yield PCRE_ERROR_PARTIAL,
        because then the partial match takes precedence.

FORMERLY RESTRICTED PATTERNS

        For releases of PCRE prior to 8.00, because of the way certain internal
-       optimizations  were  implemented  in  the  pcre_exec()  function,   the
-       PCRE_PARTIAL  option  (predecessor  of  PCRE_PARTIAL_SOFT) could not be
-       used with all patterns. From release 8.00 onwards, the restrictions  no
-       longer  apply,  and  partial matching with pcre_exec() can be requested
+       optimizations   were  implemented  in  the  pcre_exec()  function,  the
+       PCRE_PARTIAL option (predecessor of  PCRE_PARTIAL_SOFT)  could  not  be
+       used  with all patterns. From release 8.00 onwards, the restrictions no
+       longer apply, and partial matching with pcre_exec()  can  be  requested
        for any pattern.

        Items that were formerly restricted were repeated single characters and
-       repeated  metasequences. If PCRE_PARTIAL was set for a pattern that did
-       not conform to the restrictions, pcre_exec() returned  the  error  code
-       PCRE_ERROR_BADPARTIAL  (-13).  This error code is no longer in use. The
-       PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to find out if  a  compiled
+       repeated metasequences. If PCRE_PARTIAL was set for a pattern that  did
+       not  conform  to  the restrictions, pcre_exec() returned the error code
+       PCRE_ERROR_BADPARTIAL (-13). This error code is no longer in  use.  The
+       PCRE_INFO_OKPARTIAL  call  to pcre_fullinfo() to find out if a compiled
        pattern can be used for partial matching now always returns 1.

EXAMPLE OF PARTIAL MATCHING USING PCRETEST

-       If  the  escape  sequence  \P  is  present in a pcretest data line, the
-       PCRE_PARTIAL_SOFT option is used for  the  match.  Here  is  a  run  of
+       If the escape sequence \P is present  in  a  pcretest  data  line,  the
+       PCRE_PARTIAL_SOFT  option  is  used  for  the  match.  Here is a run of
        pcretest that uses the date example quoted above:

            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -5581,24 +5691,24 @@
          data> j\P
          No match

-       The  first  data  string  is  matched completely, so pcretest shows the
-       matched substrings. The remaining four strings do not  match  the  com-
+       The first data string is matched  completely,  so  pcretest  shows  the
+       matched  substrings.  The  remaining four strings do not match the com-
        plete pattern, but the first two are partial matches. Similar output is
        obtained when pcre_dfa_exec() is used.

-       If the escape sequence \P is present more than once in a pcretest  data
+       If  the escape sequence \P is present more than once in a pcretest data
        line, the PCRE_PARTIAL_HARD option is set for the match.

MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()

        When a partial match has been found using pcre_dfa_exec(), it is possi-
-       ble to continue the match by  providing  additional  subject  data  and
-       calling  pcre_dfa_exec()  again  with the same compiled regular expres-
-       sion, this time setting the PCRE_DFA_RESTART option. You must pass  the
+       ble  to  continue  the  match  by providing additional subject data and
+       calling pcre_dfa_exec() again with the same  compiled  regular  expres-
+       sion,  this time setting the PCRE_DFA_RESTART option. You must pass the
        same working space as before, because this is where details of the pre-
-       vious partial match are stored. Here  is  an  example  using  pcretest,
-       using  the  \R  escape  sequence to set the PCRE_DFA_RESTART option (\D
+       vious  partial  match  are  stored.  Here is an example using pcretest,
+       using the \R escape sequence to set  the  PCRE_DFA_RESTART  option  (\D
        specifies the use of pcre_dfa_exec()):

            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -5607,26 +5717,26 @@
          data> n05\R\D
           0: n05

-       The first call has "23ja" as the subject, and requests  partial  match-
-       ing;  the  second  call  has  "n05"  as  the  subject for the continued
-       (restarted) match.  Notice that when the match is  complete,  only  the
-       last  part  is  shown;  PCRE  does not retain the previously partially-
-       matched string. It is up to the calling program to do that if it  needs
+       The  first  call has "23ja" as the subject, and requests partial match-
+       ing; the second call  has  "n05"  as  the  subject  for  the  continued
+       (restarted)  match.   Notice  that when the match is complete, only the
+       last part is shown; PCRE does  not  retain  the  previously  partially-
+       matched  string. It is up to the calling program to do that if it needs
        to.

-       You  can  set  the  PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
-       PCRE_DFA_RESTART to continue partial matching over  multiple  segments.
-       This  facility  can  be  used  to  pass  very  long  subject strings to
+       You can set the PCRE_PARTIAL_SOFT  or  PCRE_PARTIAL_HARD  options  with
+       PCRE_DFA_RESTART  to  continue partial matching over multiple segments.
+       This facility can  be  used  to  pass  very  long  subject  strings  to
        pcre_dfa_exec().

MULTI-SEGMENT MATCHING WITH pcre_exec()

-       From release 8.00, pcre_exec() can also be  used  to  do  multi-segment
-       matching.  Unlike  pcre_dfa_exec(),  it  is not possible to restart the
-       previous match with a new segment of data. Instead, new  data  must  be
-       added  to  the  previous  subject  string, and the entire match re-run,
-       starting from the point where the partial match occurred. Earlier  data
+       From  release  8.00,  pcre_exec()  can also be used to do multi-segment
+       matching. Unlike pcre_dfa_exec(), it is not  possible  to  restart  the
+       previous  match  with  a new segment of data. Instead, new data must be
+       added to the previous subject string,  and  the  entire  match  re-run,
+       starting  from the point where the partial match occurred. Earlier data
        can be discarded.  Consider an unanchored pattern that matches dates:

            re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
@@ -5634,39 +5744,45 @@
          Partial match: 23ja

        The this stage, an application could discard the text preceding "23ja",
-       add on text from the next segment, and call pcre_exec()  again.  Unlike
-       pcre_dfa_exec(),  the  entire matching string must always be available,
-       and the complete matching process occurs for each call, so more  memory
+       add  on  text from the next segment, and call pcre_exec() again. Unlike
+       pcre_dfa_exec(), the entire matching string must always  be  available,
+       and  the complete matching process occurs for each call, so more memory
        and more processing time is needed.

+       Note: If the pattern contains lookbehind assertions, or \K,  or  starts
+       with  \b  or  \B,  the string that is returned for a partial match will
+       include characters that precede the partially  matched  string  itself,
+       because  these  must  be  retained when adding on more characters for a
+       subsequent matching attempt.

+
ISSUES WITH MULTI-SEGMENT MATCHING

        Certain types of pattern may give problems with multi-segment matching,
        whichever matching function is used.

-       1. If the pattern contains tests for the beginning or end  of  a  line,
-       you  need  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropri-
-       ate, when the subject string for any call does not contain  the  begin-
+       1.  If  the  pattern contains tests for the beginning or end of a line,
+       you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options,  as  appropri-
+       ate,  when  the subject string for any call does not contain the begin-
        ning or end of a line.

-       2.  If  the  pattern contains backward assertions (including \b or \B),
-       you need to arrange for some overlap in the subject  strings  to  allow
-       for  them  to  be  correctly tested at the start of each substring. For
-       example, using pcre_dfa_exec(), you could pass the  subject  in  chunks
-       that  are 500 bytes long, but in a buffer of 700 bytes, with the start-
-       ing offset set to 200 and the previous 200 bytes at the  start  of  the
-       buffer.
+       2. Lookbehind assertions at the start of a pattern are catered  for  in
+       the  offsets that are returned for a partial match. However, in theory,
+       a lookbehind assertion later in the pattern could require even  earlier
+       characters  to  be inspected, and it might not have been reached when a
+       partial match occurs. This is probably an extremely unlikely case;  you
+       could  guard  against  it to a certain extent by always including extra
+       characters at the start.

-       3.  Matching  a subject string that is split into multiple segments may
-       not always produce exactly the same result as matching over one  single
-       long  string,  especially  when  PCRE_PARTIAL_SOFT is used. The section
-       "Partial Matching and Word Boundaries" above describes  an  issue  that
-       arises  if  the  pattern ends with \b or \B. Another kind of difference
-       may occur when there are multiple  matching  possibilities,  because  a
+       3. Matching a subject string that is split into multiple  segments  may
+       not  always produce exactly the same result as matching over one single
+       long string, especially when PCRE_PARTIAL_SOFT  is  used.  The  section
+       "Partial  Matching  and  Word Boundaries" above describes an issue that
+       arises if the pattern ends with \b or \B. Another  kind  of  difference
+       may  occur  when  there  are multiple matching possibilities, because a
        partial match result is given only when there are no completed matches.
        This means that as soon as the shortest match has been found, continua-
-       tion  to  a  new subject segment is no longer possible.  Consider again
+       tion to a new subject segment is no longer  possible.   Consider  again
        this pcretest example:

            re> /dog(sbody)?/
@@ -5680,17 +5796,17 @@
           0: dogsbody
           1: dog

-       The first data line passes the string "dogsb" to  pcre_exec(),  setting
-       the  PCRE_PARTIAL_SOFT  option.  Although the string is a partial match
-       for "dogsbody", the  result  is  not  PCRE_ERROR_PARTIAL,  because  the
-       shorter  string  "dog" is a complete match. Similarly, when the subject
-       is presented to pcre_dfa_exec() in several parts ("do" and "gsb"  being
+       The  first  data line passes the string "dogsb" to pcre_exec(), setting
+       the PCRE_PARTIAL_SOFT option. Although the string is  a  partial  match
+       for  "dogsbody",  the  result  is  not  PCRE_ERROR_PARTIAL, because the
+       shorter string "dog" is a complete match. Similarly, when  the  subject
+       is  presented to pcre_dfa_exec() in several parts ("do" and "gsb" being
        the first two) the match stops when "dog" has been found, and it is not
-       possible to continue. On the other hand, if "dogsbody" is presented  as
+       possible  to continue. On the other hand, if "dogsbody" is presented as
        a single string, pcre_dfa_exec() finds both matches.

        Because of these problems, it is probably best to use PCRE_PARTIAL_HARD
-       when matching multi-segment data. The example above then  behaves  dif-
+       when  matching  multi-segment data. The example above then behaves dif-
        ferently:

            re> /dog(sbody)?/
@@ -5703,25 +5819,25 @@

        4. Patterns that contain alternatives at the top level which do not all
-       start with the  same  pattern  item  may  not  work  as  expected  when
+       start  with  the  same  pattern  item  may  not  work  as expected when
        pcre_dfa_exec() is used. For example, consider this pattern:

          1234|3789

-       If  the  first  part of the subject is "ABC123", a partial match of the
-       first alternative is found at offset 3. There is no partial  match  for
+       If the first part of the subject is "ABC123", a partial  match  of  the
+       first  alternative  is found at offset 3. There is no partial match for
        the second alternative, because such a match does not start at the same
-       point in the subject string. Attempting to  continue  with  the  string
-       "7890"  does  not  yield  a  match because only those alternatives that
-       match at one point in the subject are remembered.  The  problem  arises
-       because  the  start  of the second alternative matches within the first
-       alternative. There is no problem with  anchored  patterns  or  patterns
+       point  in  the  subject  string. Attempting to continue with the string
+       "7890" does not yield a match  because  only  those  alternatives  that
+       match  at  one  point in the subject are remembered. The problem arises
+       because the start of the second alternative matches  within  the  first
+       alternative.  There  is  no  problem with anchored patterns or patterns
        such as:

          1234|ABCD

-       where  no  string can be a partial match for both alternatives. This is
-       not a problem if pcre_exec() is used, because the entire match  has  to
+       where no string can be a partial match for both alternatives.  This  is
+       not  a  problem if pcre_exec() is used, because the entire match has to
        be rerun each time:

            re> /1234|3789/
@@ -5740,7 +5856,7 @@

REVISION

-       Last updated: 31 August 2009
+       Last updated: 05 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -6062,6 +6178,10 @@
        easier to slot in PCRE as a replacement library.  Other  POSIX  options
        are not even defined.

+       There  are also some other options that are not defined by POSIX. These
+       have been added at the request of users who want to make use of certain
+       PCRE-specific features via the POSIX calling interface.
+
        When  PCRE  is  called  via these functions, it is only the API that is
        POSIX-like in style. The syntax and semantics of  the  regular  expres-
        sions  themselves  are  still  those of Perl, subject to the setting of
@@ -6116,6 +6236,12 @@
        ing,  the  nmatch  and  pmatch  arguments  are ignored, and no captured
        strings are returned.

+         REG_UNGREEDY
+
+       The PCRE_UNGREEDY option is set when the regular expression  is  passed
+       for  compilation  to the native function. Note that REG_UNGREEDY is not
+       part of the POSIX standard.
+
          REG_UTF8

        The PCRE_UTF8 option is set when the regular expression is  passed  for
@@ -6128,7 +6254,7 @@
        semantics.  In particular, the way it handles newline characters in the
        subject string is the Perl way, not the POSIX way.  Note  that  setting
        PCRE_MULTILINE  has only some of the effects specified for REG_NEWLINE.
-       It does not affect the way newlines are matched by . (they  aren't)  or
+       It does not affect the way newlines are matched by . (they are not)  or
        by a negative class such as [^a] (they are).

        The  yield of regcomp() is zero on success, and non-zero otherwise. The
@@ -6215,36 +6341,39 @@
        matched strings  is  returned.  The  nmatch  and  pmatch  arguments  of
        regexec() are ignored.

+       If the value of nmatch is zero, or if the value pmatch is NULL, no data
+       about any matched strings is returned.
+
        Otherwise,the portion of the string that was matched, and also any cap-
        tured substrings, are returned via the pmatch argument, which points to
-       an  array  of nmatch structures of type regmatch_t, containing the mem-
-       bers rm_so and rm_eo. These contain the offset to the  first  character
-       of  each  substring and the offset to the first character after the end
-       of each substring, respectively. The 0th element of the vector  relates
-       to  the  entire portion of string that was matched; subsequent elements
-       relate to the capturing subpatterns of the regular  expression.  Unused
+       an array of nmatch structures of type regmatch_t, containing  the  mem-
+       bers  rm_so  and rm_eo. These contain the offset to the first character
+       of each substring and the offset to the first character after  the  end
+       of  each substring, respectively. The 0th element of the vector relates
+       to the entire portion of string that was matched;  subsequent  elements
+       relate  to  the capturing subpatterns of the regular expression. Unused
        entries in the array have both structure members set to -1.

-       A  successful  match  yields  a  zero  return;  various error codes are
-       defined in the header file, of  which  REG_NOMATCH  is  the  "expected"
+       A successful match yields  a  zero  return;  various  error  codes  are
+       defined  in  the  header  file,  of which REG_NOMATCH is the "expected"
        failure code.

ERROR MESSAGES

        The regerror() function maps a non-zero errorcode from either regcomp()
-       or regexec() to a printable message. If preg is  not  NULL,  the  error
+       or  regexec()  to  a  printable message. If preg is not NULL, the error
        should have arisen from the use of that structure. A message terminated
-       by a binary zero is placed  in  errbuf.  The  length  of  the  message,
-       including  the  zero, is limited to errbuf_size. The yield of the func-
+       by  a  binary  zero  is  placed  in  errbuf. The length of the message,
+       including the zero, is limited to errbuf_size. The yield of  the  func-
        tion is the size of buffer needed to hold the whole message.

MEMORY USAGE

-       Compiling a regular expression causes memory to be allocated and  asso-
-       ciated  with  the preg structure. The function regfree() frees all such
-       memory, after which preg may no longer be used as  a  compiled  expres-
+       Compiling  a regular expression causes memory to be allocated and asso-
+       ciated with the preg structure. The function regfree() frees  all  such
+       memory,  after  which  preg may no longer be used as a compiled expres-
        sion.

@@ -6257,7 +6386,7 @@

REVISION

-       Last updated: 15 August 2009
+       Last updated: 02 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------

Modified: code/trunk/doc/pcrecompat.3
===================================================================
--- code/trunk/doc/pcrecompat.3    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/pcrecompat.3    2009-09-18 19:12:35 UTC (rev 453)
@@ -69,7 +69,7 @@
 .P
 8. Fairly obviously, PCRE does not support the (?{code}) and (??{code})
 constructions. However, there is support for recursive patterns. This is not
-available in Perl 5.8, but will be in Perl 5.10. Also, the PCRE "callout"
+available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout"
 feature allows an external function to be called during pattern matching. See
 the
 .\" HREF
@@ -78,7 +78,17 @@
 documentation for details.
 .P
 9. Subpatterns that are called recursively or as "subroutines" are always
-treated as atomic groups in PCRE. This is like Python, but unlike Perl.
+treated as atomic groups in PCRE. This is like Python, but unlike Perl. There 
+is a discussion of an example that explains this in more detail in the
+.\" HTML <a href="pcrepattern.html#recursiondifference">
+.\" </a>
+section on recursion differences from Perl
+.\"
+in the
+.\" HREF
+\fBpcrecompat\fP
+.\"
+page.
 .P
 10. There are some differences that are concerned with the settings of captured
 strings when part of a pattern is repeated. For example, matching "aba" against
@@ -145,6 +155,6 @@
 .rs
 .sp
 .nf
-Last updated: 16 September 2009
+Last updated: 18 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcredemo.3
===================================================================
--- code/trunk/doc/pcredemo.3    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/pcredemo.3    2009-09-18 19:12:35 UTC (rev 453)
@@ -240,12 +240,12 @@
 *                                                                        *
 * If the previous match WAS for an empty string, we can't do that, as it *
 * would lead to an infinite loop. Instead, a special call of pcre_exec() *
-* is made with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set. The first  *
-* of these tells PCRE that an empty string is not a valid match; other   *
-* possibilities must be tried. The second flag restricts PCRE to one     *
-* match attempt at the initial string position. If this match succeeds,  *
-* an alternative to the empty string match has been found, and we can    *
-* proceed round the loop.                                                *
+* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set.    *
+* The first of these tells PCRE that an empty string at the start of the *
+* subject is not a valid match; other possibilities must be tried. The   *
+* second flag restricts PCRE to one match attempt at the initial string  *
+* position. If this match succeeds, an alternative to the empty string   *
+* match has been found, and we can proceed round the loop.               *
 *************************************************************************/

 if (!find_all)
@@ -268,7 +268,7 @@
   if (ovector[0] == ovector[1])
     {
     if (ovector[0] == subject_length) break;
-    options = PCRE_NOTEMPTY | PCRE_ANCHORED;
+    options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
     }

/* Run the next matching operation */

Modified: code/trunk/doc/pcregrep.txt
===================================================================
--- code/trunk/doc/pcregrep.txt    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/pcregrep.txt    2009-09-18 19:12:35 UTC (rev 453)
@@ -70,203 +70,204 @@
        of the above options is used.

        Patterns  that can match an empty string are accepted, but empty string
-       matches are not recognized. An example is the pattern "(super)?(man)?",
-       in  which  all  components  are optional. This pattern finds all occur-
-       rences of both "super" and "man"; the output differs from matching with
-       "super|man" when only the matching substrings are being shown.
+       matches   are   never   recognized.   An   example   is   the   pattern
+       "(super)?(man)?",  in  which  all components are optional. This pattern
+       finds all occurrences of both "super" and  "man";  the  output  differs
+       from  matching  with  "super|man" when only the matching substrings are
+       being shown.

-       If  the  LC_ALL  or LC_CTYPE environment variable is set, pcregrep uses
-       the value to set a locale when calling the PCRE library.  The  --locale
+       If the LC_ALL or LC_CTYPE environment variable is  set,  pcregrep  uses
+       the  value to set a locale when calling the PCRE library.  The --locale
        option can be used to override this.

SUPPORT FOR COMPRESSED FILES

-       It  is  possible  to compile pcregrep so that it uses libz or libbz2 to
-       read files whose names end in .gz or .bz2, respectively. You  can  find
+       It is possible to compile pcregrep so that it uses libz  or  libbz2  to
+       read  files  whose names end in .gz or .bz2, respectively. You can find
        out whether your binary has support for one or both of these file types
        by running it with the --help option. If the appropriate support is not
-       present,  files are treated as plain text. The standard input is always
+       present, files are treated as plain text. The standard input is  always
        so treated.

OPTIONS

-       The order in which some of the options appear can  affect  the  output.
-       For  example,  both  the  -h and -l options affect the printing of file
-       names. Whichever comes later in the command line will be the  one  that
+       The  order  in  which some of the options appear can affect the output.
+       For example, both the -h and -l options affect  the  printing  of  file
+       names.  Whichever  comes later in the command line will be the one that
        takes effect.

-       --        This  terminate the list of options. It is useful if the next
-                 item on the command line starts with a hyphen but is  not  an
-                 option.  This allows for the processing of patterns and file-
+       --        This terminate the list of options. It is useful if the  next
+                 item  on  the command line starts with a hyphen but is not an
+                 option. This allows for the processing of patterns and  file-
                  names that start with hyphens.

        -A number, --after-context=number
-                 Output number lines of context after each matching  line.  If
+                 Output  number  lines of context after each matching line. If
                  filenames and/or line numbers are being output, a hyphen sep-
-                 arator is used instead of a colon for the  context  lines.  A
-                 line  containing  "--" is output between each group of lines,
-                 unless they are in fact contiguous in  the  input  file.  The
-                 value  of number is expected to be relatively small. However,
+                 arator  is  used  instead of a colon for the context lines. A
+                 line containing "--" is output between each group  of  lines,
+                 unless  they  are  in  fact contiguous in the input file. The
+                 value of number is expected to be relatively small.  However,
                  pcregrep guarantees to have up to 8K of following text avail-
                  able for context output.

        -B number, --before-context=number
-                 Output  number lines of context before each matching line. If
+                 Output number lines of context before each matching line.  If
                  filenames and/or line numbers are being output, a hyphen sep-
-                 arator  is  used  instead of a colon for the context lines. A
-                 line containing "--" is output between each group  of  lines,
-                 unless  they  are  in  fact contiguous in the input file. The
-                 value of number is expected to be relatively small.  However,
+                 arator is used instead of a colon for the  context  lines.  A
+                 line  containing  "--" is output between each group of lines,
+                 unless they are in fact contiguous in  the  input  file.  The
+                 value  of number is expected to be relatively small. However,
                  pcregrep guarantees to have up to 8K of preceding text avail-
                  able for context output.

        -C number, --context=number
-                 Output number lines of context both  before  and  after  each
-                 matching  line.  This is equivalent to setting both -A and -B
+                 Output  number  lines  of  context both before and after each
+                 matching line.  This is equivalent to setting both -A and  -B
                  to the same value.

        -c, --count
-                 Do not output individual lines from the files that are  being
+                 Do  not output individual lines from the files that are being
                  scanned; instead output the number of lines that would other-
-                 wise have been shown. If no lines are  selected,  the  number
-                 zero  is  output.  If  several files are are being scanned, a
-                 count is output for each of them. However,  if  the  --files-
-                 with-matches  option  is  also  used,  only those files whose
+                 wise  have  been  shown. If no lines are selected, the number
+                 zero is output. If several files are  are  being  scanned,  a
+                 count  is  output  for each of them. However, if the --files-
+                 with-matches option is also  used,  only  those  files  whose
                  counts are greater than zero are listed. When -c is used, the
                  -A, -B, and -C options are ignored.

        --colour, --color
                  If this option is given without any data, it is equivalent to
-                 "--colour=auto".  If data is required, it must  be  given  in
+                 "--colour=auto".   If  data  is required, it must be given in
                  the same shell item, separated by an equals sign.

        --colour=value, --color=value
                  This option specifies under what circumstances the parts of a
                  line that matched a pattern should be coloured in the output.
-                 By  default,  the output is not coloured. The value (which is
-                 optional, see above) may be "never", "always", or "auto".  In
-                 the  latter case, colouring happens only if the standard out-
-                 put is connected to a terminal. More resources are used  when
-                 colouring  is enabled, because pcregrep has to search for all
-                 possible matches in a line, not just one, in order to  colour
+                 By default, the output is not coloured. The value  (which  is
+                 optional,  see above) may be "never", "always", or "auto". In
+                 the latter case, colouring happens only if the standard  out-
+                 put  is connected to a terminal. More resources are used when
+                 colouring is enabled, because pcregrep has to search for  all
+                 possible  matches in a line, not just one, in order to colour
                  them all.

                  The colour that is used can be specified by setting the envi-
                  ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
                  of this variable should be a string of two numbers, separated
-                 by a semicolon. They are copied  directly  into  the  control
-                 string  for  setting  colour  on  a  terminal,  so it is your
-                 responsibility to ensure that they make sense. If neither  of
-                 the  environment  variables  is  set,  the default is "1;31",
+                 by  a  semicolon.  They  are copied directly into the control
+                 string for setting colour  on  a  terminal,  so  it  is  your
+                 responsibility  to ensure that they make sense. If neither of
+                 the environment variables is  set,  the  default  is  "1;31",
                  which gives red.

        -D action, --devices=action
-                 If an input path is  not  a  regular  file  or  a  directory,
-                 "action"  specifies  how  it is to be processed. Valid values
+                 If  an  input  path  is  not  a  regular file or a directory,
+                 "action" specifies how it is to be  processed.  Valid  values
                  are "read" (the default) or "skip" (silently skip the path).

        -d action, --directories=action
                  If an input path is a directory, "action" specifies how it is
-                 to  be  processed.   Valid  values  are "read" (the default),
-                 "recurse" (equivalent to the -r option), or "skip"  (silently
-                 skip  the path). In the default case, directories are read as
-                 if they were ordinary files. In some  operating  systems  the
-                 effect  of reading a directory like this is an immediate end-
+                 to be processed.  Valid  values  are  "read"  (the  default),
+                 "recurse"  (equivalent to the -r option), or "skip" (silently
+                 skip the path). In the default case, directories are read  as
+                 if  they  were  ordinary files. In some operating systems the
+                 effect of reading a directory like this is an immediate  end-
                  of-file.

        -e pattern, --regex=pattern, --regexp=pattern
                  Specify a pattern to be matched. This option can be used mul-
                  tiple times in order to specify several patterns. It can also
-                 be used as a way of specifying a single pattern  that  starts
-                 with  a hyphen. When -e is used, no argument pattern is taken
-                 from the command line; all  arguments  are  treated  as  file
-                 names.  There is an overall maximum of 100 patterns. They are
-                 applied to each line in the order in which they  are  defined
+                 be  used  as a way of specifying a single pattern that starts
+                 with a hyphen. When -e is used, no argument pattern is  taken
+                 from  the  command  line;  all  arguments are treated as file
+                 names. There is an overall maximum of 100 patterns. They  are
+                 applied  to  each line in the order in which they are defined
                  until one matches (or fails to match if -v is used). If -f is
-                 used with -e, the command line patterns  are  matched  first,
-                 followed  by  the  patterns from the file, independent of the
-                 order in which these options are specified. Note that  multi-
+                 used  with  -e,  the command line patterns are matched first,
+                 followed by the patterns from the file,  independent  of  the
+                 order  in which these options are specified. Note that multi-
                  ple use of -e is not the same as a single pattern with alter-
                  natives. For example, X|Y finds the first character in a line
-                 that  is  X or Y, whereas if the two patterns are given sepa-
+                 that is X or Y, whereas if the two patterns are  given  sepa-
                  rately, pcregrep finds X if it is present, even if it follows
-                 Y  in the line. It finds Y only if there is no X in the line.
-                 This really matters only if you are  using  -o  to  show  the
+                 Y in the line. It finds Y only if there is no X in the  line.
+                 This  really  matters  only  if  you are using -o to show the
                  part(s) of the line that matched.

        --exclude=pattern
                  When pcregrep is searching the files in a directory as a con-
-                 sequence of the -r (recursive  search)  option,  any  regular
+                 sequence  of  the  -r  (recursive search) option, any regular
                  files whose names match the pattern are excluded. Subdirecto-
-                 ries are not excluded  by  this  option;  they  are  searched
-                 recursively,  subject  to the --exclude_dir and --include_dir
-                 options. The pattern is a PCRE  regular  expression,  and  is
+                 ries  are  not  excluded  by  this  option; they are searched
+                 recursively, subject to the --exclude_dir  and  --include_dir
+                 options.  The  pattern  is  a PCRE regular expression, and is
                  matched against the final component of the file name (not the
-                 entire path). If a  file  name  matches  both  --include  and
-                 --exclude,  it  is excluded.  There is no short form for this
+                 entire  path).  If  a  file  name  matches both --include and
+                 --exclude, it is excluded.  There is no short form  for  this
                  option.

        --exclude_dir=pattern
-                 When pcregrep is searching the contents of a directory  as  a
-                 consequence  of  the -r (recursive search) option, any subdi-
-                 rectories whose names match the pattern are  excluded.  (Note
-                 that  the  --exclude  option does not affect subdirectories.)
-                 The pattern is a PCRE  regular  expression,  and  is  matched
-                 against  the  final  component  of  the  name (not the entire
-                 path). If a subdirectory name matches both --include_dir  and
-                 --exclude_dir,  it  is  excluded.  There is no short form for
+                 When  pcregrep  is searching the contents of a directory as a
+                 consequence of the -r (recursive search) option,  any  subdi-
+                 rectories  whose  names match the pattern are excluded. (Note
+                 that the --exclude option does  not  affect  subdirectories.)
+                 The  pattern  is  a  PCRE  regular expression, and is matched
+                 against the final component  of  the  name  (not  the  entire
+                 path).  If a subdirectory name matches both --include_dir and
+                 --exclude_dir, it is excluded. There is  no  short  form  for
                  this option.

        -F, --fixed-strings
-                 Interpret each pattern as a list of fixed strings,  separated
-                 by  newlines,  instead  of  as  a  regular expression. The -w
-                 (match as a word) and -x (match whole line)  options  can  be
+                 Interpret  each pattern as a list of fixed strings, separated
+                 by newlines, instead of  as  a  regular  expression.  The  -w
+                 (match  as  a  word) and -x (match whole line) options can be
                  used with -F. They apply to each of the fixed strings. A line
                  is selected if any of the fixed strings are found in it (sub-
                  ject to -w or -x, if present).

        -f filename, --file=filename
-                 Read  a  number  of patterns from the file, one per line, and
-                 match them against each line of input. A data line is  output
+                 Read a number of patterns from the file, one  per  line,  and
+                 match  them against each line of input. A data line is output
                  if any of the patterns match it. The filename can be given as
                  "-" to refer to the standard input. When -f is used, patterns
-                 specified  on  the command line using -e may also be present;
+                 specified on the command line using -e may also  be  present;
                  they are tested before the file's patterns. However, no other
-                 pattern  is  taken  from  the command line; all arguments are
-                 treated as file names. There is an  overall  maximum  of  100
+                 pattern is taken from the command  line;  all  arguments  are
+                 treated  as  file  names.  There is an overall maximum of 100
                  patterns. Trailing white space is removed from each line, and
-                 blank lines are ignored. An empty file contains  no  patterns
-                 and  therefore  matches  nothing. See also the comments about
-                 multiple patterns versus a single pattern  with  alternatives
+                 blank  lines  are ignored. An empty file contains no patterns
+                 and therefore matches nothing. See also  the  comments  about
+                 multiple  patterns  versus a single pattern with alternatives
                  in the description of -e above.

        --file-offsets
-                 Instead  of  showing lines or parts of lines that match, show
-                 each match as an offset from the start  of  the  file  and  a
-                 length,  separated  by  a  comma. In this mode, no context is
-                 shown. That is, the -A, -B, and -C options  are  ignored.  If
+                 Instead of showing lines or parts of lines that  match,  show
+                 each  match  as  an  offset  from the start of the file and a
+                 length, separated by a comma. In this  mode,  no  context  is
+                 shown.  That  is,  the -A, -B, and -C options are ignored. If
                  there is more than one match in a line, each of them is shown
-                 separately. This option is mutually  exclusive  with  --line-
+                 separately.  This  option  is mutually exclusive with --line-
                  offsets and --only-matching.

        -H, --with-filename
-                 Force  the  inclusion  of the filename at the start of output
-                 lines when searching a single file. By default, the  filename
-                 is  not  shown in this case. For matching lines, the filename
+                 Force the inclusion of the filename at the  start  of  output
+                 lines  when searching a single file. By default, the filename
+                 is not shown in this case. For matching lines,  the  filename
                  is followed by a colon; for context lines, a hyphen separator
-                 is  used.  If  a line number is also being output, it follows
+                 is used. If a line number is also being  output,  it  follows
                  the file name.

        -h, --no-filename
-                 Suppress the output filenames when searching multiple  files.
-                 By  default,  filenames  are  shown  when  multiple files are
-                 searched. For matching lines, the filename is followed  by  a
-                 colon;  for  context lines, a hyphen separator is used.  If a
+                 Suppress  the output filenames when searching multiple files.
+                 By default, filenames  are  shown  when  multiple  files  are
+                 searched.  For  matching lines, the filename is followed by a
+                 colon; for context lines, a hyphen separator is used.   If  a
                  line number is also being output, it follows the file name.

-       --help    Output a help message, giving brief details  of  the  command
+       --help    Output  a  help  message, giving brief details of the command
                  options and file type support, and then exit.

        -i, --ignore-case
@@ -276,38 +277,38 @@
                  When pcregrep is searching the files in a directory as a con-
                  sequence of the -r (recursive search) option, only those reg-
                  ular files whose names match the pattern are included. Subdi-
-                 rectories are always included and searched recursively,  sub-
+                 rectories  are always included and searched recursively, sub-
                  ject to the --include_dir and --exclude_dir options. The pat-
                  tern is a PCRE regular expression, and is matched against the
-                 final  component of the file name (not the entire path). If a
+                 final component of the file name (not the entire path). If  a
                  file  name  matches  both  --include  and  --exclude,  it  is
                  excluded. There is no short form for this option.

        --include_dir=pattern
-                 When  pcregrep  is searching the contents of a directory as a
-                 consequence of the -r (recursive search) option,  only  those
-                 subdirectories  whose  names  match the pattern are included.
-                 (Note that the --include option does not  affect  subdirecto-
-                 ries.)  The  pattern  is  a  PCRE  regular expression, and is
-                 matched against the final component  of  the  name  (not  the
-                 entire   path).   If   a   subdirectory   name  matches  both
-                 --include_dir and --exclude_dir, it is excluded. There is  no
+                 When pcregrep is searching the contents of a directory  as  a
+                 consequence  of  the -r (recursive search) option, only those
+                 subdirectories whose names match the  pattern  are  included.
+                 (Note  that  the --include option does not affect subdirecto-
+                 ries.) The pattern is  a  PCRE  regular  expression,  and  is
+                 matched  against  the  final  component  of the name (not the
+                 entire  path).  If   a   subdirectory   name   matches   both
+                 --include_dir  and --exclude_dir, it is excluded. There is no
                  short form for this option.

        -L, --files-without-match
-                 Instead  of  outputting lines from the files, just output the
-                 names of the files that do not contain any lines  that  would
-                 have  been  output. Each file name is output once, on a sepa-
+                 Instead of outputting lines from the files, just  output  the
+                 names  of  the files that do not contain any lines that would
+                 have been output. Each file name is output once, on  a  sepa-
                  rate line.

        -l, --files-with-matches
-                 Instead of outputting lines from the files, just  output  the
+                 Instead  of  outputting lines from the files, just output the
                  names of the files containing lines that would have been out-
-                 put. Each file name is  output  once,  on  a  separate  line.
-                 Searching  normally stops as soon as a matching line is found
-                 in a file. However, if the -c (count) option  is  also  used,
-                 matching  continues in order to obtain the correct count, and
-                 those files that have at least one  match  are  listed  along
+                 put.  Each  file  name  is  output  once, on a separate line.
+                 Searching normally stops as soon as a matching line is  found
+                 in  a  file.  However, if the -c (count) option is also used,
+                 matching continues in order to obtain the correct count,  and
+                 those  files  that  have  at least one match are listed along
                  with their counts. Using this option with -c is a way of sup-
                  pressing the listing of files with no matches.

@@ -317,106 +318,106 @@
                  input)" is used. There is no short form for this option.

        --line-offsets
-                 Instead of showing lines or parts of lines that  match,  show
+                 Instead  of  showing lines or parts of lines that match, show
                  each match as a line number, the offset from the start of the
-                 line, and a length. The line number is terminated by a  colon
-                 (as  usual; see the -n option), and the offset and length are
-                 separated by a comma. In this  mode,  no  context  is  shown.
-                 That  is, the -A, -B, and -C options are ignored. If there is
-                 more than one match in a line, each of them  is  shown  sepa-
+                 line,  and a length. The line number is terminated by a colon
+                 (as usual; see the -n option), and the offset and length  are
+                 separated  by  a  comma.  In  this mode, no context is shown.
+                 That is, the -A, -B, and -C options are ignored. If there  is
+                 more  than  one  match in a line, each of them is shown sepa-
                  rately. This option is mutually exclusive with --file-offsets
                  and --only-matching.

        --locale=locale-name
-                 This option specifies a locale to be used for pattern  match-
-                 ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi-
-                 ronment variables.  If  no  locale  is  specified,  the  PCRE
-                 library's  default (usually the "C" locale) is used. There is
+                 This  option specifies a locale to be used for pattern match-
+                 ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi-
+                 ronment  variables.  If  no  locale  is  specified,  the PCRE
+                 library's default (usually the "C" locale) is used. There  is
                  no short form for this option.

        -M, --multiline
-                 Allow patterns to match more than one line. When this  option
+                 Allow  patterns to match more than one line. When this option
                  is given, patterns may usefully contain literal newline char-
-                 acters and internal occurrences of ^ and  $  characters.  The
-                 output  for  any one match may consist of more than one line.
-                 When this option is set, the PCRE library is called in  "mul-
-                 tiline"  mode.   There is a limit to the number of lines that
-                 can be matched, imposed by the way that pcregrep buffers  the
-                 input  file as it scans it. However, pcregrep ensures that at
+                 acters  and  internal  occurrences of ^ and $ characters. The
+                 output for any one match may consist of more than  one  line.
+                 When  this option is set, the PCRE library is called in "mul-
+                 tiline" mode.  There is a limit to the number of  lines  that
+                 can  be matched, imposed by the way that pcregrep buffers the
+                 input file as it scans it. However, pcregrep ensures that  at
                  least 8K characters or the rest of the document (whichever is
-                 the  shorter)  are  available for forward matching, and simi-
+                 the shorter) are available for forward  matching,  and  simi-
                  larly the previous 8K characters (or all the previous charac-
-                 ters,  if  fewer  than 8K) are guaranteed to be available for
+                 ters, if fewer than 8K) are guaranteed to  be  available  for
                  lookbehind assertions.

        -N newline-type, --newline=newline-type
-                 The PCRE library  supports  five  different  conventions  for
-                 indicating  the  ends of lines. They are the single-character
-                 sequences CR (carriage return) and LF  (linefeed),  the  two-
-                 character  sequence CRLF, an "anycrlf" convention, which rec-
-                 ognizes any of the preceding three types, and an  "any"  con-
+                 The  PCRE  library  supports  five  different conventions for
+                 indicating the ends of lines. They are  the  single-character
+                 sequences  CR  (carriage  return) and LF (linefeed), the two-
+                 character sequence CRLF, an "anycrlf" convention, which  rec-
+                 ognizes  any  of the preceding three types, and an "any" con-
                  vention, in which any Unicode line ending sequence is assumed
-                 to end a line. The Unicode sequences are the three just  men-
-                 tioned,   plus  VT  (vertical  tab,  U+000B),  FF  (formfeed,
-                 U+000C),  NEL  (next  line,  U+0085),  LS  (line   separator,
+                 to  end a line. The Unicode sequences are the three just men-
+                 tioned,  plus  VT  (vertical  tab,  U+000B),  FF   (formfeed,
+                 U+000C),   NEL  (next  line,  U+0085),  LS  (line  separator,
                  U+2028), and PS (paragraph separator, U+2029).

                  When  the  PCRE  library  is  built,  a  default  line-ending
-                 sequence  is  specified.   This  is  normally  the   standard
+                 sequence   is  specified.   This  is  normally  the  standard
                  sequence for the operating system. Unless otherwise specified
-                 by this option, pcregrep uses  the  library's  default.   The
+                 by  this  option,  pcregrep  uses the library's default.  The
                  possible values for this option are CR, LF, CRLF, ANYCRLF, or
-                 ANY. This makes it possible to use  pcregrep  on  files  that
-                 have  come  from  other environments without having to modify
-                 their line endings. If the data that is  being  scanned  does
-                 not  agree  with  the convention set by this option, pcregrep
+                 ANY.  This  makes  it  possible to use pcregrep on files that
+                 have come from other environments without  having  to  modify
+                 their  line  endings.  If the data that is being scanned does
+                 not agree with the convention set by  this  option,  pcregrep
                  may behave in strange ways.

        -n, --line-number
                  Precede each output line by its line number in the file, fol-
-                 lowed  by  a colon for matching lines or a hyphen for context
-                 lines. If the filename is also being output, it precedes  the
+                 lowed by a colon for matching lines or a hyphen  for  context
+                 lines.  If the filename is also being output, it precedes the
                  line number. This option is forced if --line-offsets is used.

        -o, --only-matching
-                 Show  only  the  part  of the line that matched a pattern. In
-                 this mode, no context is shown. That is, the -A, -B,  and  -C
-                 options  are  ignored.  If  there is more than one match in a
-                 line, each of them is shown separately.  If  -o  is  combined
-                 with  -v  (invert the sense of the match to find non-matching
-                 lines), no output is generated, but the return  code  is  set
+                 Show only the part of the line that  matched  a  pattern.  In
+                 this  mode,  no context is shown. That is, the -A, -B, and -C
+                 options are ignored. If there is more than  one  match  in  a
+                 line,  each  of  them  is shown separately. If -o is combined
+                 with -v (invert the sense of the match to  find  non-matching
+                 lines),  no  output  is generated, but the return code is set
                  appropriately. This option is mutually exclusive with --file-
                  offsets and --line-offsets.

        -q, --quiet
                  Work quietly, that is, display nothing except error messages.
-                 The  exit  status  indicates  whether or not any matches were
+                 The exit status indicates whether or  not  any  matches  were
                  found.

        -r, --recursive
-                 If any given path is a directory, recursively scan the  files
-                 it  contains, taking note of any --include and --exclude set-
-                 tings. By default, a directory is read as a normal  file;  in
-                 some  operating  systems this gives an immediate end-of-file.
-                 This option is a shorthand  for  setting  the  -d  option  to
+                 If  any given path is a directory, recursively scan the files
+                 it contains, taking note of any --include and --exclude  set-
+                 tings.  By  default, a directory is read as a normal file; in
+                 some operating systems this gives an  immediate  end-of-file.
+                 This  option  is  a  shorthand  for  setting the -d option to
                  "recurse".

        -s, --no-messages
-                 Suppress  error  messages  about  non-existent  or unreadable
-                 files. Such files are quietly skipped.  However,  the  return
+                 Suppress error  messages  about  non-existent  or  unreadable
+                 files.  Such  files  are quietly skipped. However, the return
                  code is still 2, even if matches were found in other files.

        -u, --utf-8
-                 Operate  in UTF-8 mode. This option is available only if PCRE
-                 has been compiled with UTF-8 support. Both patterns and  sub-
+                 Operate in UTF-8 mode. This option is available only if  PCRE
+                 has  been compiled with UTF-8 support. Both patterns and sub-
                  ject lines must be valid strings of UTF-8 characters.

        -V, --version
-                 Write  the  version  numbers of pcregrep and the PCRE library
+                 Write the version numbers of pcregrep and  the  PCRE  library
                  that is being used to the standard error stream.

        -v, --invert-match
-                 Invert the sense of the match, so that  lines  which  do  not
+                 Invert  the  sense  of  the match, so that lines which do not
                  match any of the patterns are the ones that are found.

        -w, --word-regex, --word-regexp
@@ -424,38 +425,38 @@
                  lent to having \b at the start and end of the pattern.

        -x, --line-regex, --line-regexp
-                 Force the patterns to be anchored (each must  start  matching
-                 at  the beginning of a line) and in addition, require them to
-                 match entire lines. This is equivalent  to  having  ^  and  $
+                 Force  the  patterns to be anchored (each must start matching
+                 at the beginning of a line) and in addition, require them  to
+                 match  entire  lines.  This  is  equivalent to having ^ and $
                  characters at the start and end of each alternative branch in
                  every pattern.

ENVIRONMENT VARIABLES

-       The environment variables LC_ALL and LC_CTYPE  are  examined,  in  that
-       order,  for  a  locale.  The first one that is set is used. This can be
-       overridden by the --locale option.  If  no  locale  is  set,  the  PCRE
+       The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
+       order, for a locale. The first one that is set is  used.  This  can  be
+       overridden  by  the  --locale  option.  If  no  locale is set, the PCRE
        library's default (usually the "C" locale) is used.

NEWLINES

-       The  -N (--newline) option allows pcregrep to scan files with different
-       newline conventions from the default.  However,  the  setting  of  this
-       option  does not affect the way in which pcregrep writes information to
-       the standard error and output streams. It uses the  string  "\n"  in  C
-       printf()  calls  to  indicate newlines, relying on the C I/O library to
-       convert this to an appropriate sequence if the  output  is  sent  to  a
+       The -N (--newline) option allows pcregrep to scan files with  different
+       newline  conventions  from  the  default.  However, the setting of this
+       option does not affect the way in which pcregrep writes information  to
+       the  standard  error  and  output streams. It uses the string "\n" in C
+       printf() calls to indicate newlines, relying on the C  I/O  library  to
+       convert  this  to  an  appropriate  sequence if the output is sent to a
        file.

OPTIONS COMPATIBILITY

        The majority of short and long forms of pcregrep's options are the same
-       as in the GNU grep program. Any long option of  the  form  --xxx-regexp
-       (GNU  terminology) is also available as --xxx-regex (PCRE terminology).
-       However, the --locale, -M, --multiline, -u,  and  --utf-8  options  are
+       as  in  the  GNU grep program. Any long option of the form --xxx-regexp
+       (GNU terminology) is also available as --xxx-regex (PCRE  terminology).
+       However,  the  --locale,  -M,  --multiline, -u, and --utf-8 options are
        specific to pcregrep. If both the -c and -l options are given, GNU grep
        lists only file names, without counts, but pcregrep gives the counts.

@@ -463,48 +464,48 @@
OPTIONS WITH DATA

        There are four different ways in which an option with data can be spec-
-       ified.   If  a  short  form option is used, the data may follow immedi-
+       ified.  If a short form option is used, the  data  may  follow  immedi-
        ately, or in the next command line item. For example:

          -f/some/file
          -f /some/file

-       If a long form option is used, the data may appear in the same  command
+       If  a long form option is used, the data may appear in the same command
        line item, separated by an equals character, or (with one exception) it
        may appear in the next command line item. For example:

          --file=/some/file
          --file /some/file

-       Note, however, that if you want to supply a file name beginning with  ~
-       as  data  in  a  shell  command,  and have the shell expand ~ to a home
+       Note,  however, that if you want to supply a file name beginning with ~
+       as data in a shell command, and have the  shell  expand  ~  to  a  home
        directory, you must separate the file name from the option, because the
        shell does not treat ~ specially unless it is at the start of an item.

-       The  exception  to  the  above is the --colour (or --color) option, for
-       which the data is optional. If this option does have data, it  must  be
-       given  in  the first form, using an equals character. Otherwise it will
+       The exception to the above is the --colour  (or  --color)  option,  for
+       which  the  data is optional. If this option does have data, it must be
+       given in the first form, using an equals character. Otherwise  it  will
        be assumed that it has no data.

MATCHING ERRORS

-       It is possible to supply a regular expression that takes  a  very  long
-       time  to  fail  to  match certain lines. Such patterns normally involve
-       nested indefinite repeats, for example: (a+)*\d when matched against  a
-       line  of  a's  with  no  final  digit. The PCRE matching function has a
-       resource limit that causes it to abort in these circumstances. If  this
+       It  is  possible  to supply a regular expression that takes a very long
+       time to fail to match certain lines.  Such  patterns  normally  involve
+       nested  indefinite repeats, for example: (a+)*\d when matched against a
+       line of a's with no final digit.  The  PCRE  matching  function  has  a
+       resource  limit that causes it to abort in these circumstances. If this
        happens, pcregrep outputs an error message and the line that caused the
-       problem to the standard error stream. If there are more  than  20  such
+       problem  to  the  standard error stream. If there are more than 20 such
        errors, pcregrep gives up.

DIAGNOSTICS

        Exit status is 0 if any matches were found, 1 if no matches were found,
-       and 2 for syntax errors and non-existent or inacessible files (even  if
-       matches  were  found in other files) or too many matching errors. Using
-       the -s option to suppress error messages about inaccessble  files  does
+       and  2 for syntax errors and non-existent or inacessible files (even if
+       matches were found in other files) or too many matching  errors.  Using
+       the  -s  option to suppress error messages about inaccessble files does
        not affect the return code.

@@ -522,5 +523,5 @@

REVISION

-       Last updated: 12 August 2009
+       Last updated: 13 September 2009
        Copyright (c) 1997-2009 University of Cambridge.

Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/pcrepattern.3    2009-09-18 19:12:35 UTC (rev 453)
@@ -1922,7 +1922,7 @@
 Obviously, PCRE cannot support the interpolation of Perl code. Instead, it
 supports special syntax for recursion of the entire pattern, and also for
 individual subpattern recursion. After its introduction in PCRE and Python,
-this kind of recursion was introduced into Perl at release 5.10.
+this kind of recursion was subsequently introduced into Perl at release 5.10.
 .P
 A special item that consists of (? followed by a number greater than zero and a
 closing parenthesis is a recursive call of the subpattern of the given number,
@@ -1930,11 +1930,6 @@
 call, which is described in the next section.) The special item (?R) or (?0) is
 a recursive call of the entire regular expression.
 .P
-In PCRE (like Python, but unlike Perl), a recursive subpattern call is always
-treated as an atomic group. That is, once it has matched some of the subject
-string, it is never re-entered, even if it contains untried alternatives and
-there is a subsequent matching failure.
-.P
 This PCRE pattern solves the nested parentheses problem (assume the
 PCRE_EXTENDED option is set so that white space is ignored):
 .sp
@@ -2022,6 +2017,70 @@
 is the actual recursive call.
 .
 .
+.\" HTML <a name="recursiondifference"></a>
+.SS "Recursion difference from Perl"
+.rs
+.sp
+In PCRE (like Python, but unlike Perl), a recursive subpattern call is always
+treated as an atomic group. That is, once it has matched some of the subject
+string, it is never re-entered, even if it contains untried alternatives and
+there is a subsequent matching failure. This can be illustrated by the 
+following pattern, which purports to match a palindromic string that contains 
+an odd number of characters (for example, "a", "aba", "abcba", "abcdcba"):
+.sp
+  ^(.|(.)(?1)\e2)$
+.sp
+The idea is that it either matches a single character, or two identical 
+characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE 
+it does not if the pattern is longer than three characters. Consider the
+subject string "abcba":
+.P
+At the top level, the first character is matched, but as it is not at the end 
+of the string, the first alternative fails; the second alternative is taken
+and the recursion kicks in. The recursive call to subpattern 1 successfully
+matches the next character ("b"). (Note that the beginning and end of line
+tests are not part of the recursion).
+.P
+Back at the top level, the next character ("c") is compared with what
+subpattern 2 matched, which was "a". This fails. Because the recursion is 
+treated as an atomic group, there are now no backtracking points, and so the
+entire match fails. (Perl is able, at this point, to re-enter the recursion and
+try the second alternative.) However, if the pattern is written with the
+alternatives in the other order, things are different:
+.sp
+  ^((.)(?1)\e2|.)$
+.sp
+This time, the recursing alternative is tried first, and continues to recurse 
+until it runs out of characters, at which point the recursion fails. But this 
+time we do have another alternative to try at the higher level. That is the big 
+difference: in the previous case the remaining alternative is at a deeper
+recursion level, which PCRE cannot use.
+.P
+To change the pattern so that matches all palindromic strings, not just those 
+with an odd number of characters, it is tempting to change the pattern to this:
+.sp
+  ^((.)(?1)\e2|.?)$
+.sp
+Again, this works in Perl, but not in PCRE, and for the same reason. When a 
+deeper recursion has matched a single character, it cannot be entered again in 
+order to match an empty string. The solution is to separate the two cases, and 
+write out the odd and even cases as alternatives at the higher level:
+.sp
+  ^(?:((.)(?1)\e2|)|((.)(?3)\e4|.))
+.sp   
+If you want to match typical palindromic phrases, the pattern has to ignore all 
+non-word characters, which can be done like this:
+.sp
+  ^\eW*+(?:((.)\eW*+(?1)\eW*+\e2|)|((.)\eW*+(?3)\eW*+\4|\eW*+.\eW*+))\eW*+$
+.sp
+If run with the PCRE_CASELESS option, this pattern matches phrases such as "A 
+man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note 
+the use of the possessive quantifier *+ to avoid backtracking into sequences of 
+non-word characters. Without this, PCRE takes a great deal longer (ten times or
+more) to match typical phrases, and Perl takes so long that you think it has
+gone into a loop.
+.
+.
 .\" HTML <a name="subpatternsassubroutines"></a>
 .SH "SUBPATTERNS AS SUBROUTINES"
 .rs
@@ -2258,6 +2317,6 @@
 .rs
 .sp
 .nf
-Last updated: 16 September 2009
+Last updated: 18 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcretest.txt
===================================================================
--- code/trunk/doc/pcretest.txt    2009-09-16 19:18:51 UTC (rev 452)
+++ code/trunk/doc/pcretest.txt    2009-09-18 19:12:35 UTC (rev 453)
@@ -196,87 +196,88 @@
        or \B).

        If  any  call  to  pcre_exec()  in a /g or /G sequence matches an empty
-       string, the next call is done with the PCRE_NOTEMPTY and  PCRE_ANCHORED
-       flags  set in order to search for another, non-empty, match at the same
-       point.  If this second match fails, the start  offset  is  advanced  by
-       one,  and  the normal match is retried. This imitates the way Perl han-
-       dles such cases when using the /g modifier or the split() function.
+       string, the next  call  is  done  with  the  PCRE_NOTEMPTY_ATSTART  and
+       PCRE_ANCHORED  flags  set  in  order  to search for another, non-empty,
+       match at the same point. If this second match fails, the  start  offset
+       is  advanced  by  one  character, and the normal match is retried. This
+       imitates the way Perl handles such cases when using the /g modifier  or
+       the split() function.

    Other modifiers

        There are yet more modifiers for controlling the way pcretest operates.

-       The /+ modifier requests that as well as outputting the substring  that
-       matched  the  entire  pattern,  pcretest  should in addition output the
-       remainder of the subject string. This is useful  for  tests  where  the
+       The  /+ modifier requests that as well as outputting the substring that
+       matched the entire pattern, pcretest  should  in  addition  output  the
+       remainder  of  the  subject  string. This is useful for tests where the
        subject contains multiple copies of the same substring.

-       The  /B modifier is a debugging feature. It requests that pcretest out-
-       put a representation of the compiled byte code after compilation.  Nor-
-       mally  this  information contains length and offset values; however, if
-       /Z is also present, this data is replaced by spaces. This is a  special
+       The /B modifier is a debugging feature. It requests that pcretest  out-
+       put  a representation of the compiled byte code after compilation. Nor-
+       mally this information contains length and offset values;  however,  if
+       /Z  is also present, this data is replaced by spaces. This is a special
        feature for use in the automatic test scripts; it ensures that the same
        output is generated for different internal link sizes.

-       The /L modifier must be followed directly by the name of a locale,  for
+       The  /L modifier must be followed directly by the name of a locale, for
        example,

          /pattern/Lfr_FR

        For this reason, it must be the last modifier. The given locale is set,
-       pcre_maketables() is called to build a set of character tables for  the
-       locale,  and  this  is then passed to pcre_compile() when compiling the
-       regular expression. Without an /L  modifier,  NULL  is  passed  as  the
-       tables  pointer; that is, /L applies only to the expression on which it
+       pcre_maketables()  is called to build a set of character tables for the
+       locale, and this is then passed to pcre_compile()  when  compiling  the
+       regular  expression.  Without  an  /L  modifier,  NULL is passed as the
+       tables pointer; that is, /L applies only to the expression on which  it
        appears.

-       The /I modifier requests that pcretest  output  information  about  the
-       compiled  pattern (whether it is anchored, has a fixed first character,
-       and so on). It does this by calling pcre_fullinfo() after  compiling  a
-       pattern.  If  the pattern is studied, the results of that are also out-
+       The  /I  modifier  requests  that pcretest output information about the
+       compiled pattern (whether it is anchored, has a fixed first  character,
+       and  so  on). It does this by calling pcre_fullinfo() after compiling a
+       pattern. If the pattern is studied, the results of that are  also  out-
        put.

-       The /D modifier is a PCRE debugging feature, and is equivalent to  /BI,
+       The  /D modifier is a PCRE debugging feature, and is equivalent to /BI,
        that is, both the /B and the /I modifiers.

        The /F modifier causes pcretest to flip the byte order of the fields in
-       the compiled pattern that  contain  2-byte  and  4-byte  numbers.  This
-       facility  is  for testing the feature in PCRE that allows it to execute
+       the  compiled  pattern  that  contain  2-byte  and 4-byte numbers. This
+       facility is for testing the feature in PCRE that allows it  to  execute
        patterns that were compiled on a host with a different endianness. This
-       feature  is  not  available  when  the POSIX interface to PCRE is being
-       used, that is, when the /P pattern modifier is specified. See also  the
+       feature is not available when the POSIX  interface  to  PCRE  is  being
+       used,  that is, when the /P pattern modifier is specified. See also the
        section about saving and reloading compiled patterns below.

-       The  /S  modifier causes pcre_study() to be called after the expression
+       The /S modifier causes pcre_study() to be called after  the  expression
        has been compiled, and the results used when the expression is matched.

-       The /M modifier causes the size of memory block used to hold  the  com-
+       The  /M  modifier causes the size of memory block used to hold the com-
        piled pattern to be output.

-       The  /P modifier causes pcretest to call PCRE via the POSIX wrapper API
-       rather than its native API. When this  is  done,  all  other  modifiers
-       except  /i,  /m, and /+ are ignored. REG_ICASE is set if /i is present,
-       and REG_NEWLINE is set if /m is present. The  wrapper  functions  force
+       The /P modifier causes pcretest to call PCRE via the POSIX wrapper  API
+       rather  than  its  native  API.  When this is done, all other modifiers
+       except /i, /m, and /+ are ignored. REG_ICASE is set if /i  is  present,
+       and  REG_NEWLINE  is  set if /m is present. The wrapper functions force
        PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.

-       The  /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option
-       set. This turns on support for UTF-8 character handling in  PCRE,  pro-
-       vided  that  it  was  compiled with this support enabled. This modifier
+       The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8  option
+       set.  This  turns on support for UTF-8 character handling in PCRE, pro-
+       vided that it was compiled with this  support  enabled.  This  modifier
        also causes any non-printing characters in output strings to be printed
        using the \x{hh...} notation if they are valid UTF-8 sequences.

-       If  the  /?  modifier  is  used  with  /8,  it  causes pcretest to call
-       pcre_compile() with the  PCRE_NO_UTF8_CHECK  option,  to  suppress  the
+       If the /? modifier  is  used  with  /8,  it  causes  pcretest  to  call
+       pcre_compile()  with  the  PCRE_NO_UTF8_CHECK  option,  to suppress the
        checking of the string for UTF-8 validity.

DATA LINES

-       Before  each  data  line is passed to pcre_exec(), leading and trailing
-       whitespace is removed, and it is then scanned for \  escapes.  Some  of
-       these  are  pretty esoteric features, intended for checking out some of
-       the more complicated features of PCRE. If you are just  testing  "ordi-
-       nary"  regular  expressions,  you probably don't need any of these. The
+       Before each data line is passed to pcre_exec(),  leading  and  trailing
+       whitespace  is  removed,  and it is then scanned for \ escapes. Some of
+       these are pretty esoteric features, intended for checking out  some  of
+       the  more  complicated features of PCRE. If you are just testing "ordi-
+       nary" regular expressions, you probably don't need any  of  these.  The
        following escapes are recognized:

          \a         alarm (BEL, \x07)
@@ -323,7 +324,8 @@
          \M         discover the minimum MATCH_LIMIT and
                       MATCH_LIMIT_RECURSION settings
          \N         pass the PCRE_NOTEMPTY option to pcre_exec()
-                      or pcre_dfa_exec()
+                      or pcre_dfa_exec(); if used twice, pass the
+                      PCRE_NOTEMPTY_ATSTART option
          \Odd       set the size of the output vector passed to
                       pcre_exec() to dd (any number of digits)
          \P         pass the PCRE_PARTIAL_SOFT option to pcre_exec()
@@ -351,73 +353,73 @@
          \<any>     pass the PCRE_NEWLINE_ANY option to pcre_exec()
                       or pcre_dfa_exec()

-       The escapes that specify line ending  sequences  are  literal  strings,
+       The  escapes  that  specify  line ending sequences are literal strings,
        exactly as shown. No more than one newline setting should be present in
        any data line.

-       A backslash followed by anything else just escapes the  anything  else.
-       If  the very last character is a backslash, it is ignored. This gives a
-       way of passing an empty line as data, since a real  empty  line  termi-
+       A  backslash  followed by anything else just escapes the anything else.
+       If the very last character is a backslash, it is ignored. This gives  a
+       way  of  passing  an empty line as data, since a real empty line termi-
        nates the data input.

-       If  \M  is present, pcretest calls pcre_exec() several times, with dif-
-       ferent values in the match_limit and  match_limit_recursion  fields  of
-       the  pcre_extra  data structure, until it finds the minimum numbers for
+       If \M is present, pcretest calls pcre_exec() several times,  with  dif-
+       ferent  values  in  the match_limit and match_limit_recursion fields of
+       the pcre_extra data structure, until it finds the minimum  numbers  for
        each parameter that allow pcre_exec() to complete. The match_limit num-
-       ber  is  a  measure of the amount of backtracking that takes place, and
+       ber is a measure of the amount of backtracking that  takes  place,  and
        checking it out can be instructive. For most simple matches, the number
-       is  quite  small,  but for patterns with very large numbers of matching
-       possibilities, it can become large very quickly with increasing  length
+       is quite small, but for patterns with very large  numbers  of  matching
+       possibilities,  it can become large very quickly with increasing length
        of subject string. The match_limit_recursion number is a measure of how
-       much stack (or, if PCRE is compiled with  NO_RECURSE,  how  much  heap)
+       much  stack  (or,  if  PCRE is compiled with NO_RECURSE, how much heap)
        memory is needed to complete the match attempt.

-       When  \O  is  used, the value specified may be higher or lower than the
+       When \O is used, the value specified may be higher or  lower  than  the
        size set by the -O command line option (or defaulted to 45); \O applies
        only to the call of pcre_exec() for the line in which it appears.

-       If  the /P modifier was present on the pattern, causing the POSIX wrap-
-       per API to be used, the only option-setting  sequences  that  have  any
-       effect  are \B and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively,
+       If the /P modifier was present on the pattern, causing the POSIX  wrap-
+       per  API  to  be  used, the only option-setting sequences that have any
+       effect are \B and \Z, causing REG_NOTBOL and REG_NOTEOL,  respectively,
        to be passed to regexec().

-       The use of \x{hh...} to represent UTF-8 characters is not dependent  on
-       the  use  of  the  /8 modifier on the pattern. It is recognized always.
-       There may be any number of hexadecimal digits inside  the  braces.  The
-       result  is  from  one  to  six bytes, encoded according to the original
-       UTF-8 rules of RFC 2279. This allows for  values  in  the  range  0  to
-       0x7FFFFFFF.  Note  that not all of those are valid Unicode code points,
-       or indeed valid UTF-8 characters according to the later  rules  in  RFC
+       The  use of \x{hh...} to represent UTF-8 characters is not dependent on
+       the use of the /8 modifier on the pattern.  It  is  recognized  always.
+       There  may  be  any number of hexadecimal digits inside the braces. The
+       result is from one to six bytes,  encoded  according  to  the  original
+       UTF-8  rules  of  RFC  2279.  This  allows for values in the range 0 to
+       0x7FFFFFFF. Note that not all of those are valid Unicode  code  points,
+       or  indeed  valid  UTF-8 characters according to the later rules in RFC
        3629.

THE ALTERNATIVE MATCHING FUNCTION

-       By   default,  pcretest  uses  the  standard  PCRE  matching  function,
+       By  default,  pcretest  uses  the  standard  PCRE  matching   function,
        pcre_exec() to match each data line. From release 6.0, PCRE supports an
-       alternative  matching  function,  pcre_dfa_test(),  which operates in a
-       different way, and has some restrictions. The differences  between  the
+       alternative matching function, pcre_dfa_test(),  which  operates  in  a
+       different  way,  and has some restrictions. The differences between the
        two functions are described in the pcrematching documentation.

-       If  a data line contains the \D escape sequence, or if the command line
-       contains the -dfa option, the alternative matching function is  called.
+       If a data line contains the \D escape sequence, or if the command  line
+       contains  the -dfa option, the alternative matching function is called.
        This function finds all possible matches at a given point. If, however,
-       the \F escape sequence is present in the data line, it stops after  the
+       the  \F escape sequence is present in the data line, it stops after the
        first match is found. This is always the shortest possible match.

DEFAULT OUTPUT FROM PCRETEST

-       This  section  describes  the output when the normal matching function,
+       This section describes the output when the  normal  matching  function,
        pcre_exec(), is being used.

        When a match succeeds, pcretest outputs the list of captured substrings
-       that  pcre_exec()  returns,  starting with number 0 for the string that
-       matched the whole pattern. Otherwise, it outputs "No match" or "Partial
-       match:"  followed  by the partially matching substring when pcre_exec()
-       returns PCRE_ERROR_NOMATCH  or  PCRE_ERROR_PARTIAL,  respectively,  and
-       otherwise  the  PCRE  negative  error  number. Here is an example of an
-       interactive pcretest run.
+       that pcre_exec() returns, starting with number 0 for  the  string  that
+       matched  the  whole  pattern. Otherwise, it outputs "No match" when the
+       return is PCRE_ERROR_NOMATCH, and "Partial match:" followed by the par-
+       tially  matching substring when pcre_exec() returns PCRE_ERROR_PARTIAL.
+       For any other returns, it outputs the PCRE negative error number.  Here
+       is an example of an interactive pcretest run.

          $ pcretest
          PCRE version 7.0 30-Nov-2006
@@ -429,11 +431,11 @@
          data> xyz
          No match

-       Note that unset capturing substrings that are not followed by one  that
-       is  set are not returned by pcre_exec(), and are not shown by pcretest.
-       In the following example, there are two capturing substrings, but  when
-       the  first  data  line  is  matched, the second, unset substring is not
-       shown. An "internal" unset substring is shown as "<unset>", as for  the
+       Note  that unset capturing substrings that are not followed by one that
+       is set are not returned by pcre_exec(), and are not shown by  pcretest.
+       In  the following example, there are two capturing substrings, but when
+       the first data line is matched, the  second,  unset  substring  is  not
+       shown.  An "internal" unset substring is shown as "<unset>", as for the
        second data line.

            re> /(a)|(b)/
@@ -445,11 +447,11 @@
           1: <unset>
           2: b

-       If  the strings contain any non-printing characters, they are output as
-       \0x escapes, or as \x{...} escapes if the /8 modifier  was  present  on
-       the  pattern.  See below for the definition of non-printing characters.
-       If the pattern has the /+ modifier, the output for substring 0 is  fol-
-       lowed  by  the  the rest of the subject string, identified by "0+" like
+       If the strings contain any non-printing characters, they are output  as
+       \0x  escapes,  or  as \x{...} escapes if the /8 modifier was present on
+       the pattern. See below for the definition of  non-printing  characters.
+       If  the pattern has the /+ modifier, the output for substring 0 is fol-
+       lowed by the the rest of the subject string, identified  by  "0+"  like
        this:

            re> /cat/+
@@ -457,7 +459,7 @@
           0: cat
           0+ aract

-       If the pattern has the /g or /G modifier,  the  results  of  successive
+       If  the  pattern  has  the /g or /G modifier, the results of successive
        matching attempts are output in sequence, like this:

            re> /\Bi(\w\w)/g
@@ -471,24 +473,24 @@

        "No match" is output only if the first match attempt fails.

-       If  any  of the sequences \C, \G, or \L are present in a data line that
-       is successfully matched, the substrings extracted  by  the  convenience
+       If any of the sequences \C, \G, or \L are present in a data  line  that
+       is  successfully  matched,  the substrings extracted by the convenience
        functions are output with C, G, or L after the string number instead of
        a colon. This is in addition to the normal full list. The string length
-       (that  is,  the return from the extraction function) is given in paren-
+       (that is, the return from the extraction function) is given  in  paren-
        theses after each string for \C and \G.

        Note that whereas patterns can be continued over several lines (a plain
        ">" prompt is used for continuations), data lines may not. However new-
-       lines can be included in data by means of the \n escape (or  \r,  \r\n,
+       lines  can  be included in data by means of the \n escape (or \r, \r\n,
        etc., depending on the newline sequence setting).

OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION

-       When  the  alternative  matching function, pcre_dfa_exec(), is used (by
-       means of the \D escape sequence or the -dfa command line  option),  the
-       output  consists  of  a list of all the matches that start at the first
+       When the alternative matching function, pcre_dfa_exec(),  is  used  (by
+       means  of  the \D escape sequence or the -dfa command line option), the
+       output consists of a list of all the matches that start  at  the  first
        point in the subject where there is at least one match. For example:

            re> /(tang|tangerine|tan)/
@@ -497,8 +499,8 @@
           1: tang
           2: tan

-       (Using the normal matching function on this data  finds  only  "tang".)
-       The  longest matching string is always given first (and numbered zero).
+       (Using  the  normal  matching function on this data finds only "tang".)
+       The longest matching string is always given first (and numbered  zero).
        After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol-
        lowed by the partially matching substring.

@@ -514,16 +516,16 @@
           1: tan
           0: tan

-       Since the matching function does not  support  substring  capture,  the
-       escape  sequences  that  are concerned with captured substrings are not
+       Since  the  matching  function  does not support substring capture, the
+       escape sequences that are concerned with captured  substrings  are  not
        relevant.

RESTARTING AFTER A PARTIAL MATCH

        When the alternative matching function has given the PCRE_ERROR_PARTIAL
-       return,  indicating that the subject partially matched the pattern, you
-       can restart the match with additional subject data by means of  the  \R
+       return, indicating that the subject partially matched the pattern,  you
+       can  restart  the match with additional subject data by means of the \R
        escape sequence. For example:

            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -532,30 +534,30 @@
          data> n05\R\D
           0: n05

-       For  further  information  about  partial matching, see the pcrepartial
+       For further information about partial  matching,  see  the  pcrepartial
        documentation.

CALLOUTS

-       If the pattern contains any callout requests, pcretest's callout  func-
-       tion  is  called  during  matching. This works with both matching func-
+       If  the pattern contains any callout requests, pcretest's callout func-
+       tion is called during matching. This works  with  both  matching  func-
        tions. By default, the called function displays the callout number, the
-       start  and  current  positions in the text at the callout time, and the
+       start and current positions in the text at the callout  time,  and  the
        next pattern item to be tested. For example, the output

          --->pqrabcdef
            0    ^  ^     \d

-       indicates that callout number 0 occurred for a match  attempt  starting
-       at  the fourth character of the subject string, when the pointer was at
-       the seventh character of the data, and when the next pattern  item  was
-       \d.  Just  one  circumflex is output if the start and current positions
+       indicates  that  callout number 0 occurred for a match attempt starting
+       at the fourth character of the subject string, when the pointer was  at
+       the  seventh  character of the data, and when the next pattern item was
+       \d. Just one circumflex is output if the start  and  current  positions
        are the same.

        Callouts numbered 255 are assumed to be automatic callouts, inserted as
-       a  result  of the /C pattern modifier. In this case, instead of showing
-       the callout number, the offset in the pattern, preceded by a  plus,  is
+       a result of the /C pattern modifier. In this case, instead  of  showing
+       the  callout  number, the offset in the pattern, preceded by a plus, is
        output. For example:

            re> /\d?[A-E]\*/C
@@ -567,86 +569,86 @@
          +10 ^ ^
           0: E*

-       The  callout  function  in pcretest returns zero (carry on matching) by
-       default, but you can use a \C item in a data line (as described  above)
+       The callout function in pcretest returns zero (carry  on  matching)  by
+       default,  but you can use a \C item in a data line (as described above)
        to change this.

-       Inserting  callouts can be helpful when using pcretest to check compli-
-       cated regular expressions. For further information about callouts,  see
+       Inserting callouts can be helpful when using pcretest to check  compli-
+       cated  regular expressions. For further information about callouts, see
        the pcrecallout documentation.

NON-PRINTING CHARACTERS

-       When  pcretest is outputting text in the compiled version of a pattern,
-       bytes other than 32-126 are always treated as  non-printing  characters
+       When pcretest is outputting text in the compiled version of a  pattern,
+       bytes  other  than 32-126 are always treated as non-printing characters
        are are therefore shown as hex escapes.

-       When  pcretest  is  outputting text that is a matched part of a subject
-       string, it behaves in the same way, unless a different locale has  been
-       set  for  the  pattern  (using  the  /L  modifier).  In  this case, the
+       When pcretest is outputting text that is a matched part  of  a  subject
+       string,  it behaves in the same way, unless a different locale has been
+       set for the  pattern  (using  the  /L  modifier).  In  this  case,  the
        isprint() function to distinguish printing and non-printing characters.

SAVING AND RELOADING COMPILED PATTERNS

-       The facilities described in this section are  not  available  when  the
+       The  facilities  described  in  this section are not available when the
        POSIX inteface to PCRE is being used, that is, when the /P pattern mod-
        ifier is specified.

        When the POSIX interface is not in use, you can cause pcretest to write
-       a  compiled  pattern to a file, by following the modifiers with > and a
+       a compiled pattern to a file, by following the modifiers with >  and  a
        file name.  For example:

          /pattern/im >/some/file

-       See the pcreprecompile documentation for a discussion about saving  and
+       See  the pcreprecompile documentation for a discussion about saving and
        re-using compiled patterns.

-       The  data  that  is  written  is  binary. The first eight bytes are the
-       length of the compiled pattern data  followed  by  the  length  of  the
-       optional  study  data,  each  written as four bytes in big-endian order
-       (most significant byte first). If there is no study  data  (either  the
+       The data that is written is binary.  The  first  eight  bytes  are  the
+       length  of  the  compiled  pattern  data  followed by the length of the
+       optional study data, each written as four  bytes  in  big-endian  order
+       (most  significant  byte  first). If there is no study data (either the
        pattern was not studied, or studying did not return any data), the sec-
-       ond length is zero. The lengths are followed by an exact  copy  of  the
+       ond  length  is  zero. The lengths are followed by an exact copy of the
        compiled pattern. If there is additional study data, this follows imme-
-       diately after the compiled pattern. After writing  the  file,  pcretest
+       diately  after  the  compiled pattern. After writing the file, pcretest
        expects to read a new pattern.

        A saved pattern can be reloaded into pcretest by specifing < and a file
-       name instead of a pattern. The name of the file must not  contain  a  <
-       character,  as  otherwise pcretest will interpret the line as a pattern
+       name  instead  of  a pattern. The name of the file must not contain a <
+       character, as otherwise pcretest will interpret the line as  a  pattern
        delimited by < characters.  For example:

           re> </some/file
          Compiled regex loaded from /some/file
          No study data

-       When the pattern has been loaded, pcretest proceeds to read data  lines
+       When  the pattern has been loaded, pcretest proceeds to read data lines
        in the usual way.

-       You  can copy a file written by pcretest to a different host and reload
-       it there, even if the new host has opposite endianness to  the  one  on
-       which  the pattern was compiled. For example, you can compile on an i86
+       You can copy a file written by pcretest to a different host and  reload
+       it  there,  even  if the new host has opposite endianness to the one on
+       which the pattern was compiled. For example, you can compile on an  i86
        machine and run on a SPARC machine.

-       File names for saving and reloading can be absolute  or  relative,  but
-       note  that the shell facility of expanding a file name that starts with
+       File  names  for  saving and reloading can be absolute or relative, but
+       note that the shell facility of expanding a file name that starts  with
        a tilde (~) is not available.

-       The ability to save and reload files in pcretest is intended for  test-
-       ing  and experimentation. It is not intended for production use because
-       only a single pattern can be written to a file. Furthermore,  there  is
-       no  facility  for  supplying  custom  character  tables  for use with a
-       reloaded pattern. If the original  pattern  was  compiled  with  custom
-       tables,  an  attempt to match a subject string using a reloaded pattern
-       is likely to cause pcretest to crash.  Finally, if you attempt to  load
+       The  ability to save and reload files in pcretest is intended for test-
+       ing and experimentation. It is not intended for production use  because
+       only  a  single pattern can be written to a file. Furthermore, there is
+       no facility for supplying  custom  character  tables  for  use  with  a
+       reloaded  pattern.  If  the  original  pattern was compiled with custom
+       tables, an attempt to match a subject string using a  reloaded  pattern
+       is  likely to cause pcretest to crash.  Finally, if you attempt to load
        a file that is not in the correct format, the result is undefined.

Esta mensagem é parte da seguinte discussão:
	Árvore completa da discussão ordenada por data

[Pcre-svn] [453] code/trunk: Add more explanation about recu…