[Pcre-svn] [461] code/trunk: Tidy up, remove trailing spaces…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [461] code/trunk: Tidy up, remove trailing spaces, etc.
Revision: 461
          http://vcs.pcre.org/viewvc?view=rev&revision=461
Author:   ph10
Date:     2009-10-05 11:59:35 +0100 (Mon, 05 Oct 2009)


Log Message:
-----------
Tidy up, remove trailing spaces, etc. for 8.00-RC1.

Modified Paths:
--------------
    code/trunk/132html
    code/trunk/ChangeLog
    code/trunk/LICENCE
    code/trunk/NEWS
    code/trunk/NON-UNIX-USE
    code/trunk/README
    code/trunk/RunGrepTest
    code/trunk/RunTest
    code/trunk/configure.ac
    code/trunk/doc/html/index.html
    code/trunk/doc/html/pcre.html
    code/trunk/doc/html/pcre_compile.html
    code/trunk/doc/html/pcre_compile2.html
    code/trunk/doc/html/pcre_dfa_exec.html
    code/trunk/doc/html/pcre_exec.html
    code/trunk/doc/html/pcre_fullinfo.html
    code/trunk/doc/html/pcreapi.html
    code/trunk/doc/html/pcrebuild.html
    code/trunk/doc/html/pcrecallout.html
    code/trunk/doc/html/pcrecompat.html
    code/trunk/doc/html/pcregrep.html
    code/trunk/doc/html/pcrematching.html
    code/trunk/doc/html/pcrepartial.html
    code/trunk/doc/html/pcrepattern.html
    code/trunk/doc/html/pcreposix.html
    code/trunk/doc/html/pcresample.html
    code/trunk/doc/html/pcretest.html
    code/trunk/doc/pcre.txt
    code/trunk/doc/pcre_compile2.3
    code/trunk/doc/pcre_dfa_exec.3
    code/trunk/doc/pcre_exec.3
    code/trunk/doc/pcre_fullinfo.3
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrebuild.3
    code/trunk/doc/pcrecallout.3
    code/trunk/doc/pcrecompat.3
    code/trunk/doc/pcregrep.1
    code/trunk/doc/pcrematching.3
    code/trunk/doc/pcrepartial.3
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcreposix.3
    code/trunk/doc/pcresample.3
    code/trunk/doc/pcretest.1
    code/trunk/doc/pcretest.txt
    code/trunk/doc/perltest.txt
    code/trunk/pcre_compile.c
    code/trunk/pcre_dfa_exec.c
    code/trunk/pcre_exec.c
    code/trunk/pcre_fullinfo.c
    code/trunk/pcre_internal.h
    code/trunk/pcre_printint.src
    code/trunk/pcre_study.c
    code/trunk/pcre_try_flipped.c
    code/trunk/pcregrep.c
    code/trunk/pcreposix.c
    code/trunk/pcretest.c
    code/trunk/perltest.pl
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/132html
===================================================================
--- code/trunk/132html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/132html    2009-10-05 10:59:35 UTC (rev 461)
@@ -231,23 +231,23 @@
       $_ = "$one $two";
       redo;            # Process the joined lines
       }
-      
+
     # .EX/.EE are used in the pcredemo page to bracket the entire program,
     # which is unmodified except for turning backslash into "\e".
-    
+
     elsif (/^\.EX\s*$/)
       {
       print TEMP "<PRE>\n";
       while (<STDIN>)
         {
-        last if /^\.EE\s*$/; 
+        last if /^\.EE\s*$/;
         s/\\e/\\/g;
-        s/&/&amp;/g;   
+        s/&/&amp;/g;
         s/</&lt;/g;
         s/>/&gt;/g;
-        print TEMP; 
-        }   
-      }     
+        print TEMP;
+        }
+      }


     # Ignore anything not recognized



Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/ChangeLog    2009-10-05 10:59:35 UTC (rev 461)
@@ -1,171 +1,171 @@
 ChangeLog for PCRE
 ------------------


-Version 8.00 ??-???-??
+Version 8.00 05-Oct-09
----------------------

 1.  The table for translating pcre_compile() error codes into POSIX error codes
-    was out-of-date, and there was no check on the pcre_compile() error code 
-    being within the table. This could lead to an OK return being given in 
+    was out-of-date, and there was no check on the pcre_compile() error code
+    being within the table. This could lead to an OK return being given in
     error.
-    
-2.  Changed the call to open a subject file in pcregrep from fopen(pathname, 
-    "r") to fopen(pathname, "rb"), which fixed a problem with some of the tests 
-    in a Windows environment. 
-    
+
+2.  Changed the call to open a subject file in pcregrep from fopen(pathname,
+    "r") to fopen(pathname, "rb"), which fixed a problem with some of the tests
+    in a Windows environment.
+
 3.  The pcregrep --count option prints the count for each file even when it is
     zero, as does GNU grep. However, pcregrep was also printing all files when
     --files-with-matches was added. Now, when both options are given, it prints
     counts only for those files that have at least one match. (GNU grep just
-    prints the file name in this circumstance, but including the count seems 
-    more useful - otherwise, why use --count?) Also ensured that the 
+    prints the file name in this circumstance, but including the count seems
+    more useful - otherwise, why use --count?) Also ensured that the
     combination -clh just lists non-zero counts, with no names.
-    
-4.  The long form of the pcregrep -F option was incorrectly implemented as 
-    --fixed_strings instead of --fixed-strings. This is an incompatible change, 
-    but it seems right to fix it, and I didn't think it was worth preserving 
-    the old behaviour. 
-    
-5.  The command line items --regex=pattern and --regexp=pattern were not 
+
+4.  The long form of the pcregrep -F option was incorrectly implemented as
+    --fixed_strings instead of --fixed-strings. This is an incompatible change,
+    but it seems right to fix it, and I didn't think it was worth preserving
+    the old behaviour.
+
+5.  The command line items --regex=pattern and --regexp=pattern were not
     recognized by pcregrep, which required --regex pattern or --regexp pattern
-    (with a space rather than an '='). The man page documented the '=' forms, 
+    (with a space rather than an '='). The man page documented the '=' forms,
     which are compatible with GNU grep; these now work.
-    
-6.  No libpcreposix.pc file was created for pkg-config; there was just 
+
+6.  No libpcreposix.pc file was created for pkg-config; there was just
     libpcre.pc and libpcrecpp.pc. The omission has been rectified.
-    
+
 7.  Added #ifndef SUPPORT_UCP into the pcre_ucd.c module, to reduce its size
-    when UCP support is not needed, by modifying the Python script that 
+    when UCP support is not needed, by modifying the Python script that
     generates it from Unicode data files. This should not matter if the module
     is correctly used as a library, but I received one complaint about 50K of
     unwanted data. My guess is that the person linked everything into his
     program rather than using a library. Anyway, it does no harm.
-    
+
 8.  A pattern such as /\x{123}{2,2}+/8 was incorrectly compiled; the trigger
-    was a minimum greater than 1 for a wide character in a possessive 
+    was a minimum greater than 1 for a wide character in a possessive
     repetition. The same bug could also affect patterns like /(\x{ff}{0,2})*/8
     which had an unlimited repeat of a nested, fixed maximum repeat of a wide
     character. Chaos in the form of incorrect output or a compiling loop could
     result.
-    
+
 9.  The restrictions on what a pattern can contain when partial matching is
-    requested for pcre_exec() have been removed. All patterns can now be 
+    requested for pcre_exec() have been removed. All patterns can now be
     partially matched by this function. In addition, if there are at least two
     slots in the offset vector, the offset of the earliest inspected character
     for the match and the offset of the end of the subject are set in them when
-    PCRE_ERROR_PARTIAL is returned. 
-    
+    PCRE_ERROR_PARTIAL is returned.
+
 10. Partial matching has been split into two forms: PCRE_PARTIAL_SOFT, which is
     synonymous with PCRE_PARTIAL, for backwards compatibility, and
     PCRE_PARTIAL_HARD, which causes a partial match to supersede a full match,
     and may be more useful for multi-segment matching, especially with
     pcre_exec().
-    
-11. Partial matching with pcre_exec() is now more intuitive. A partial match 
-    used to be given if ever the end of the subject was reached; now it is 
-    given only if matching could not proceed because another character was 
-    needed. This makes a difference in some odd cases such as Z(*FAIL) with the 
-    string "Z", which now yields "no match" instead of "partial match". In the 
-    case of pcre_dfa_exec(), "no match" is given if every matching path for the 
-    final character ended with (*FAIL). 
-    
+
+11. Partial matching with pcre_exec() is now more intuitive. A partial match
+    used to be given if ever the end of the subject was reached; now it is
+    given only if matching could not proceed because another character was
+    needed. This makes a difference in some odd cases such as Z(*FAIL) with the
+    string "Z", which now yields "no match" instead of "partial match". In the
+    case of pcre_dfa_exec(), "no match" is given if every matching path for the
+    final character ended with (*FAIL).
+
 12. Restarting a match using pcre_dfa_exec() after a partial match did not work
-    if the pattern had a "must contain" character that was already found in the 
+    if the pattern had a "must contain" character that was already found in the
     earlier partial match, unless partial matching was again requested. For
     example, with the pattern /dog.(body)?/, the "must contain" character is
     "g". If the first part-match was for the string "dog", restarting with
     "sbody" failed. This bug has been fixed.
-    
-13. The string returned by pcre_dfa_exec() after a partial match has been 
-    changed so that it starts at the first inspected character rather than the 
-    first character of the match. This makes a difference only if the pattern 
-    starts with a lookbehind assertion or \b or \B (\K is not supported by 
-    pcre_dfa_exec()). It's an incompatible change, but it makes the two 
+
+13. The string returned by pcre_dfa_exec() after a partial match has been
+    changed so that it starts at the first inspected character rather than the
+    first character of the match. This makes a difference only if the pattern
+    starts with a lookbehind assertion or \b or \B (\K is not supported by
+    pcre_dfa_exec()). It's an incompatible change, but it makes the two
     matching functions compatible, and I think it's the right thing to do.
-    
+
 14. Added a pcredemo man page, created automatically from the pcredemo.c file,
-    so that the demonstration program is easily available in environments where 
-    PCRE has not been installed from source.  
-    
+    so that the demonstration program is easily available in environments where
+    PCRE has not been installed from source.
+
 15. Arranged to add -DPCRE_STATIC to cflags in libpcre.pc, libpcreposix.cp,
     libpcrecpp.pc and pcre-config when PCRE is not compiled as a shared
     library.
-    
+
 16. Added REG_UNGREEDY to the pcreposix interface, at the request of a user.
     It maps to PCRE_UNGREEDY. It is not, of course, POSIX-compatible, but it
-    is not the first non-POSIX option to be added. Clearly some people find 
+    is not the first non-POSIX option to be added. Clearly some people find
     these options useful.
-    
-17. If a caller to the POSIX matching function regexec() passes a non-zero 
+
+17. If a caller to the POSIX matching function regexec() passes a non-zero
     value for nmatch with a NULL value for pmatch, the value of
-    nmatch is forced to zero. 
-    
+    nmatch is forced to zero.
+
 18. RunGrepTest did not have a test for the availability of the -u option of
-    the diff command, as RunTest does. It now checks in the same way as 
+    the diff command, as RunTest does. It now checks in the same way as
     RunTest, and also checks for the -b option.
-    
+
 19. If an odd number of negated classes containing just a single character
     interposed, within parentheses, between a forward reference to a named
-    subpattern and the definition of the subpattern, compilation crashed with 
-    an internal error, complaining that it could not find the referenced 
+    subpattern and the definition of the subpattern, compilation crashed with
+    an internal error, complaining that it could not find the referenced
     subpattern. An example of a crashing pattern is /(?&A)(([^m])(?<A>))/.
-    [The bug was that it was starting one character too far in when skipping 
-    over the character class, thus treating the ] as data rather than 
-    terminating the class. This meant it could skip too much.] 
-    
+    [The bug was that it was starting one character too far in when skipping
+    over the character class, thus treating the ] as data rather than
+    terminating the class. This meant it could skip too much.]
+
 20. Added PCRE_NOTEMPTY_ATSTART in order to be able to correctly implement the
-    /g option in pcretest when the pattern contains \K, which makes it possible 
+    /g option in pcretest when the pattern contains \K, which makes it possible
     to have an empty string match not at the start, even when the pattern is
-    anchored. Updated pcretest and pcredemo to use this option.  
-    
+    anchored. Updated pcretest and pcredemo to use this option.
+
 21. If the maximum number of capturing subpatterns in a recursion was greater
-    than the maximum at the outer level, the higher number was returned, but 
-    with unset values at the outer level. The correct (outer level) value is 
+    than the maximum at the outer level, the higher number was returned, but
+    with unset values at the outer level. The correct (outer level) value is
     now given.
-    
+
 22. If (*ACCEPT) appeared inside capturing parentheses, previous releases of
     PCRE did not set those parentheses (unlike Perl). I have now found a way to
     make it do so. The string so far is captured, making this feature
     compatible with Perl.
-    
-23. The tests have been re-organized, adding tests 11 and 12, to make it 
+
+23. The tests have been re-organized, adding tests 11 and 12, to make it
     possible to check the Perl 5.10 features against Perl 5.10.
-    
+
 24. Perl 5.10 allows subroutine calls in lookbehinds, as long as the subroutine
-    pattern matches a fixed length string. PCRE did not allow this; now it 
-    does. Neither allows recursion. 
-    
-25. I finally figured out how to implement a request to provide the minimum 
-    length of subject string that was needed in order to match a given pattern. 
-    (It was back references and recursion that I had previously got hung up 
-    on.) This code has now been added to pcre_study(); it finds a lower bound 
+    pattern matches a fixed length string. PCRE did not allow this; now it
+    does. Neither allows recursion.
+
+25. I finally figured out how to implement a request to provide the minimum
+    length of subject string that was needed in order to match a given pattern.
+    (It was back references and recursion that I had previously got hung up
+    on.) This code has now been added to pcre_study(); it finds a lower bound
     to the length of subject needed. It is not necessarily the greatest lower
     bound, but using it to avoid searching strings that are too short does give
     some useful speed-ups. The value is available to calling programs via
     pcre_fullinfo().
-    
+
 26. While implementing 25, I discovered to my embarrassment that pcretest had
     not been passing the result of pcre_study() to pcre_dfa_exec(), so the
     study optimizations had never been tested with that matching function.
     Oops. What is worse, even when it was passed study data, there was a bug in
     pcre_dfa_exec() that meant it never actually used it. Double oops. There
     were also very few tests of studied patterns with pcre_dfa_exec().
-    
+
 27. If (?| is used to create subpatterns with duplicate numbers, they are now
     allowed to have the same name, even if PCRE_DUPNAMES is not set. However,
     on the other side of the coin, they are no longer allowed to have different
     names, because these cannot be distinguished in PCRE, and this has caused
     confusion. (This is a difference from Perl.)
-    
-28. When duplicate subpattern names are present (necessarily with different 
-    numbers, as required by 27 above), and a test is made by name in a 
-    conditional pattern, either for a subpattern having been matched, or for 
-    recursion in such a pattern, all the associated numbered subpatterns are 
+
+28. When duplicate subpattern names are present (necessarily with different
+    numbers, as required by 27 above), and a test is made by name in a
+    conditional pattern, either for a subpattern having been matched, or for
+    recursion in such a pattern, all the associated numbered subpatterns are
     tested, and the overall condition is true if the condition is true for any
     one of them. This is the way Perl works, and is also more like the way
     testing by number works.
-    


+
Version 7.9 11-Apr-09
---------------------


Modified: code/trunk/LICENCE
===================================================================
--- code/trunk/LICENCE    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/LICENCE    2009-10-05 10:59:35 UTC (rev 461)
@@ -4,7 +4,7 @@
 PCRE is a library of functions to support regular expressions whose syntax
 and semantics are as close as possible to those of the Perl 5 language.


-Release 7 of PCRE is distributed under the terms of the "BSD" licence, as
+Release 8 of PCRE is distributed under the terms of the "BSD" licence, as
specified below. The documentation for PCRE, supplied in the "doc"
directory, is distributed under the same terms as the software itself.


Modified: code/trunk/NEWS
===================================================================
--- code/trunk/NEWS    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/NEWS    2009-10-05 10:59:35 UTC (rev 461)
@@ -1,6 +1,21 @@
 News about PCRE releases
 ------------------------


+Release 8.00 05-Oct-09
+----------------------
+
+Bugs have been fixed in the library and in pcregrep. There are also some
+enhancements. Restrictions on patterns used for partial matching have been
+removed, extra information is given for partial matches, the partial matching
+process has been improved, and an option to make a partial match override a
+full match is available. The "study" process has been enhanced by finding a
+lower bound matching length. Groups with duplicate numbers may now have
+duplicated names without the use of PCRE_DUPNAMES. However, they may not have
+different names. The documentation has been revised to reflect these changes.
+The version number has been expanded to 3 digits as it is clear that the rate
+of change is not slowing down.
+
+
Release 7.9 11-Apr-09
---------------------


Modified: code/trunk/NON-UNIX-USE
===================================================================
--- code/trunk/NON-UNIX-USE    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/NON-UNIX-USE    2009-10-05 10:59:35 UTC (rev 461)
@@ -12,10 +12,10 @@
   Comments about Win32 builds
   Building PCRE on Windows with CMake
   Use of relative paths with CMake on Windows
-  Testing with runtest.bat
+  Testing with RunTest.bat
   Building under Windows with BCC5.5
   Building PCRE on OpenVMS
-  Building PCRE on Stratus OpenVOS 
+  Building PCRE on Stratus OpenVOS



GENERAL
@@ -37,10 +37,10 @@

The PCRE distribution includes a "configure" file for use by the Configure/Make
build system, as found in many Unix-like environments. There is also support
-support for CMake, which some users prefer, in particular in Windows
-environments. There are some instructions for CMake under Windows in the
-section entitled "Building PCRE with CMake" below. CMake can also be used to
-build PCRE in Unix-like systems.
+support for CMake, which some users prefer, especially in Windows environments.
+There are some instructions for CMake under Windows in the section entitled
+"Building PCRE with CMake" below. CMake can also be used to build PCRE in
+Unix-like systems.


 GENERIC INSTRUCTIONS FOR THE PCRE C LIBRARY
@@ -304,10 +304,10 @@
 7.  Select the particular IDE / build tool that you are using (Visual
     Studio, MSYS makefiles, MinGW makefiles, etc.)


-8.  The GUI will then list several configuration options. This is where 
+8.  The GUI will then list several configuration options. This is where
     you can enable UTF-8 support or other PCRE optional features.


-9.  Hit "Configure" again. The adjacent "Generate" button should now be 
+9.  Hit "Configure" again. The adjacent "Generate" button should now be
     active.


10. Hit "Generate".
@@ -460,7 +460,7 @@
problems. I used the following packages to build PCRE:

   ftp://ftp.stratus.com/pub/vos/posix/ga/posix.save.evf.gz
-     
+
 Please read and follow the instructions that come with these packages. To start
 the build of pcre, from the root of the package type:


@@ -494,5 +494,5 @@


=========================
-Last Updated: 09 September 2009
+Last Updated: 05 October 2009
****

Modified: code/trunk/README
===================================================================
--- code/trunk/README    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/README    2009-10-05 10:59:35 UTC (rev 461)
@@ -24,7 +24,7 @@
   Shared libraries on Unix-like systems
   Cross-compiling on Unix-like systems
   Using HP's ANSI C++ compiler (aCC)
-  Using PCRE from MySQL 
+  Using PCRE from MySQL
   Making new tarballs
   Testing PCRE
   Character tables
@@ -477,16 +477,16 @@
 running the "configure" script:


CXXLDFLAGS="-lstd_v2 -lCsup_v2"
-

+
Using PCRE from MySQL
---------------------

-On systems where both PCRE and MySQL are installed, it is possible to make use
-of PCRE from within MySQL, as an alternative to the built-in pattern matching.
+On systems where both PCRE and MySQL are installed, it is possible to make use
+of PCRE from within MySQL, as an alternative to the built-in pattern matching.
There is a web page that tells you how to do this:

- http://www.mysqludf.org/lib_mysqludf_preg/index.php
+ http://www.mysqludf.org/lib_mysqludf_preg/index.php


Making new tarballs
@@ -564,23 +564,33 @@

The fourth test checks the UTF-8 support. It is not run automatically unless
PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
-running "configure". This file can be also fed directly to the perltest script,
-provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
-commented in the script, can be be used.)
+running "configure". This file can be also fed directly to the perltest.pl
+script, provided you are running Perl 5.8 or higher.

The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
features of PCRE that are not relevant to Perl.

-The sixth test checks the support for Unicode character properties. It it not
-run automatically unless PCRE is built with Unicode property support. To to
-this you must set --enable-unicode-properties when running "configure".
+The sixth test (which is Perl-5.10 compatible) checks the support for Unicode
+character properties. It it not run automatically unless PCRE is built with
+Unicode property support. To to this you must set --enable-unicode-properties
+when running "configure".

The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
property support, respectively. The eighth and ninth tests are not run
automatically unless PCRE is build with the relevant support.

+The tenth test checks some internal offsets and code size features; it is run
+only when the default "link size" of 2 is set (in other cases the sizes
+change).

+The eleventh test checks out features that are new in Perl 5.10, and the
+twelfth test checks a number internals and non-Perl features concerned with
+Unicode property support. It it not run automatically unless PCRE is built with
+Unicode property support. To to this you must set --enable-unicode-properties
+when running "configure".
+
+
Character tables
----------------

@@ -732,7 +742,7 @@
   doc/perltest.txt        plain text documentation of Perl test program
   install-sh              a shell script for installing files
   libpcre.pc.in           template for libpcre.pc for pkg-config
-  libpcreposix.pc.in      template for libpcreposix.pc for pkg-config 
+  libpcreposix.pc.in      template for libpcreposix.pc for pkg-config
   libpcrecpp.pc.in        template for libpcrecpp.pc for pkg-config
   ltmain.sh               file used to build a libtool script
   missing                 ) common stub for a few missing GNU programs while
@@ -776,4 +786,4 @@
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 16 September 2009
+Last updated: 05 October 2009


Modified: code/trunk/RunGrepTest
===================================================================
--- code/trunk/RunGrepTest    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/RunGrepTest    2009-10-05 10:59:35 UTC (rev 461)
@@ -29,9 +29,9 @@
 # that lacks a -u option. Try to deal with this; better do the test for the -b
 # option as well.


-if diff -u /dev/null /dev/null; then
+if diff -u /dev/null /dev/null; then
if diff -ub /dev/null /dev/null; then cf="diff -ub"; else cf="diff -u"; fi
-else
+else
if diff -b /dev/null /dev/null; then cf="diff -b"; else cf="diff"; fi
fi


Modified: code/trunk/RunTest
===================================================================
--- code/trunk/RunTest    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/RunTest    2009-10-05 10:59:35 UTC (rev 461)
@@ -60,7 +60,7 @@
     9) do9=yes;;
    10) do10=yes;;
    11) do11=yes;;
-   12) do12=yes;;  
+   12) do12=yes;;
    valgrind) valgrind="valgrind -q";;
     *) echo "Unknown test number $1"; exit 1;;
   esac
@@ -124,7 +124,7 @@
   if [ $utf8 -ne 0 -a $ucp -ne 0 ] ; then do9=yes; fi
   if [ $link_size -eq 2 -a $ucp -ne 0 ] ; then do10=yes; fi
   do11=yes
-  if [ $utf8 -ne 0 -a $ucp -ne 0 ] ; then do12=yes; fi  
+  if [ $utf8 -ne 0 -a $ucp -ne 0 ] ; then do12=yes; fi
 fi


# Show which release

Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/configure.ac    2009-10-05 10:59:35 UTC (rev 461)
@@ -9,7 +9,7 @@
 m4_define(pcre_major, [8])
 m4_define(pcre_minor, [00])
 m4_define(pcre_prerelease, [-RC1])
-m4_define(pcre_date, [2009-09-05])
+m4_define(pcre_date, [2009-10-05])


# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [0:1:0])

Modified: code/trunk/doc/html/index.html
===================================================================
--- code/trunk/doc/html/index.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/index.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -1,10 +1,10 @@
 <html>
-<!-- This is a manually maintained file that is the root of the HTML version of 
-     the PCRE documentation. When the HTML documents are built from the man 
-     page versions, the entire doc/html directory is emptied, this file is then 
-     copied into doc/html/index.html, and the remaining files therein are 
+<!-- This is a manually maintained file that is the root of the HTML version of
+     the PCRE documentation. When the HTML documents are built from the man
+     page versions, the entire doc/html directory is emptied, this file is then
+     copied into doc/html/index.html, and the remaining files therein are
      created by the 132html script.
--->      
+-->
 <head>
 <title>PCRE specification</title>
 </head>
@@ -74,11 +74,11 @@
 </table>


<p>
-There are also individual pages that summarize the interface for each function
+There are also individual pages that summarize the interface for each function
in the library:
</p>

-<table>    
+<table>


 <tr><td><a href="pcre_compile.html">pcre_compile</a></td>
     <td>&nbsp;&nbsp;Compile a regular expression</td></tr>
@@ -129,7 +129,7 @@


 <tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
     <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
-    
+
 <tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
     <td>&nbsp;&nbsp;Maintain reference count in compiled pattern</td></tr>



Modified: code/trunk/doc/html/pcre.html
===================================================================
--- code/trunk/doc/html/pcre.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcre.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -24,23 +24,22 @@
 <P>
 The PCRE library is a set of functions that implement regular expression
 pattern matching using the same syntax and semantics as Perl, with just a few
-differences. Certain features that appeared in Python and PCRE before they
-appeared in Perl are also available using the Python syntax. There is also some
-support for certain .NET and Oniguruma syntax items, and there is an option for
-requesting some minor changes that give better JavaScript compatibility.
+differences. Some features that appeared in Python and PCRE before they
+appeared in Perl are also available using the Python syntax, there is some
+support for one or two .NET and Oniguruma syntax items, and there is an option
+for requesting some minor changes that give better JavaScript compatibility.
 </P>
 <P>
-The current implementation of PCRE (release 8.xx) corresponds approximately
-with Perl 5.10, including support for UTF-8 encoded strings and Unicode general
-category properties. However, UTF-8 and Unicode support has to be explicitly
-enabled; it is not the default. The Unicode tables correspond to Unicode
-release 5.1.
+The current implementation of PCRE corresponds approximately with Perl 5.10,
+including support for UTF-8 encoded strings and Unicode general category
+properties. However, UTF-8 and Unicode support has to be explicitly enabled; it
+is not the default. The Unicode tables correspond to Unicode release 5.1.
 </P>
 <P>
 In addition to the Perl-compatible matching function, PCRE contains an
-alternative matching function that matches the same compiled patterns in a
-different way. In certain circumstances, the alternative function has some
-advantages. For a discussion of the two matching algorithms, see the
+alternative function that matches the same compiled patterns in a different
+way. In certain circumstances, the alternative function has some advantages.
+For a discussion of the two matching algorithms, see the
 <a href="pcrematching.html"><b>pcrematching</b></a>
 page.
 </P>
@@ -72,7 +71,8 @@
 available. The features themselves are described in the
 <a href="pcrebuild.html"><b>pcrebuild</b></a>
 page. Documentation about building PCRE for various operating systems can be
-found in the <b>README</b> file in the source distribution.
+found in the <b>README</b> and <b>NON-UNIX-USE</b> files in the source
+distribution.
 </P>
 <P>
 The library contains a number of undocumented internal functions and data
@@ -103,12 +103,12 @@
   pcrematching      discussion of the two matching algorithms
   pcrepartial       details of the partial matching facility
   pcrepattern       syntax and semantics of supported regular expressions
-  pcresyntax        quick syntax reference
   pcreperform       discussion of performance issues
   pcreposix         the POSIX-compatible C API
   pcreprecompile    details of saving and re-using precompiled patterns
   pcresample        discussion of the pcredemo program
   pcrestack         discussion of stack usage
+  pcresyntax        quick syntax reference
   pcretest          description of the <b>pcretest</b> testing command
 </pre>
 In addition, in the "man" and HTML formats, there is a short page for each
@@ -164,7 +164,7 @@
 with the PCRE_UTF8 option flag, or the pattern must start with the sequence
 (*UTF8). When either of these is the case, both the pattern and any subject
 strings that are matched against it are treated as UTF-8 strings instead of
-just strings of bytes.
+strings of 1-byte characters.
 </P>
 <P>
 If you compile PCRE with UTF-8 support, but do not use it at run time, the
@@ -298,7 +298,7 @@
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 01 September 2009
+Last updated: 28 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre_compile.html
===================================================================
--- code/trunk/doc/html/pcre_compile.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcre_compile.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -63,11 +63,11 @@
   PCRE_NEWLINE_LF         Set LF as the newline sequence
   PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
                             theses (named ones available)
-  PCRE_UNGREEDY           Invert greediness of quantifiers
-  PCRE_UTF8               Run in UTF-8 mode
   PCRE_NO_UTF8_CHECK      Do not check the pattern for UTF-8
                             validity (only relevant if
                             PCRE_UTF8 is set)
+  PCRE_UNGREEDY           Invert greediness of quantifiers
+  PCRE_UTF8               Run in UTF-8 mode
 </pre>
 PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
 PCRE_NO_UTF8_CHECK.


Modified: code/trunk/doc/html/pcre_compile2.html
===================================================================
--- code/trunk/doc/html/pcre_compile2.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcre_compile2.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -45,29 +45,33 @@
 </pre>
 The option bits are:
 <pre>
-  PCRE_ANCHORED         Force pattern anchoring
-  PCRE_AUTO_CALLOUT     Compile automatic callouts
-  PCRE_CASELESS         Do caseless matching
-  PCRE_DOLLAR_ENDONLY   $ not to match newline at end
-  PCRE_DOTALL           . matches anything including NL
-  PCRE_DUPNAMES         Allow duplicate names for subpatterns
-  PCRE_EXTENDED         Ignore whitespace and # comments
-  PCRE_EXTRA            PCRE extra features
-                          (not much use currently)
-  PCRE_FIRSTLINE        Force matching to be before newline
-  PCRE_MULTILINE        ^ and $ match newlines within data
-  PCRE_NEWLINE_ANY      Recognize any Unicode newline sequence
-  PCRE_NEWLINE_ANYCRLF  Recognize CR, LF, and CRLF as newline sequences
-  PCRE_NEWLINE_CR       Set CR as the newline sequence
-  PCRE_NEWLINE_CRLF     Set CRLF as the newline sequence
-  PCRE_NEWLINE_LF       Set LF as the newline sequence
-  PCRE_NO_AUTO_CAPTURE  Disable numbered capturing paren-
-                          theses (named ones available)
-  PCRE_UNGREEDY         Invert greediness of quantifiers
-  PCRE_UTF8             Run in UTF-8 mode
-  PCRE_NO_UTF8_CHECK    Do not check the pattern for UTF-8
-                          validity (only relevant if
-                          PCRE_UTF8 is set)
+  PCRE_ANCHORED           Force pattern anchoring
+  PCRE_AUTO_CALLOUT       Compile automatic callouts
+  PCRE_BSR_ANYCRLF        \R matches only CR, LF, or CRLF
+  PCRE_BSR_UNICODE        \R matches all Unicode line endings
+  PCRE_CASELESS           Do caseless matching
+  PCRE_DOLLAR_ENDONLY     $ not to match newline at end
+  PCRE_DOTALL             . matches anything including NL
+  PCRE_DUPNAMES           Allow duplicate names for subpatterns
+  PCRE_EXTENDED           Ignore whitespace and # comments
+  PCRE_EXTRA              PCRE extra features
+                            (not much use currently)
+  PCRE_FIRSTLINE          Force matching to be before newline
+  PCRE_JAVASCRIPT_COMPAT  JavaScript compatibility
+  PCRE_MULTILINE          ^ and $ match newlines within data
+  PCRE_NEWLINE_ANY        Recognize any Unicode newline sequence
+  PCRE_NEWLINE_ANYCRLF    Recognize CR, LF, and CRLF as newline
+                            sequences
+  PCRE_NEWLINE_CR         Set CR as the newline sequence
+  PCRE_NEWLINE_CRLF       Set CRLF as the newline sequence
+  PCRE_NEWLINE_LF         Set LF as the newline sequence
+  PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
+                            theses (named ones available)
+  PCRE_NO_UTF8_CHECK      Do not check the pattern for UTF-8
+                            validity (only relevant if
+                            PCRE_UTF8 is set)
+  PCRE_UNGREEDY           Invert greediness of quantifiers
+  PCRE_UTF8               Run in UTF-8 mode
 </pre>
 PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
 PCRE_NO_UTF8_CHECK.


Modified: code/trunk/doc/html/pcre_dfa_exec.html
===================================================================
--- code/trunk/doc/html/pcre_dfa_exec.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcre_dfa_exec.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -67,8 +67,8 @@
                            was set at compile time)
   PCRE_PARTIAL           ) Return PCRE_ERROR_PARTIAL for a partial
   PCRE_PARTIAL_SOFT      )   match if no full matches are found
-  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match 
-                           even if there is a full match as well 
+  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match
+                           even if there is a full match as well
   PCRE_DFA_SHORTEST      Return only the shortest match
   PCRE_DFA_RESTART       Restart after a partial match
 </pre>


Modified: code/trunk/doc/html/pcre_exec.html
===================================================================
--- code/trunk/doc/html/pcre_exec.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcre_exec.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -63,8 +63,8 @@
                            was set at compile time)
   PCRE_PARTIAL           ) Return PCRE_ERROR_PARTIAL for a partial
   PCRE_PARTIAL_SOFT      )   match if no full matches are found
-  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match 
-                           even if there is a full match as well 
+  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match
+                           even if there is a full match as well
 </pre>
 For details of partial matching, see the
 <a href="pcrepartial.html"><b>pcrepartial</b></a>


Modified: code/trunk/doc/html/pcre_fullinfo.html
===================================================================
--- code/trunk/doc/html/pcre_fullinfo.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcre_fullinfo.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -45,6 +45,7 @@
   PCRE_INFO_FIRSTTABLE      Table of first bytes (after studying)
   PCRE_INFO_JCHANGED        Return 1 if (?J) or (?-J) was used
   PCRE_INFO_LASTLITERAL     Literal last byte required
+  PCRE_INFO_MINLENGTH       Lower bound length of matching strings
   PCRE_INFO_NAMECOUNT       Number of named subpatterns
   PCRE_INFO_NAMEENTRYSIZE   Size of name table entry
   PCRE_INFO_NAMETABLE       Pointer to name table


Modified: code/trunk/doc/html/pcreapi.html
===================================================================
--- code/trunk/doc/html/pcreapi.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcreapi.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -400,7 +400,9 @@
 Either of the functions <b>pcre_compile()</b> or <b>pcre_compile2()</b> can be
 called to compile a pattern into an internal form. The only difference between
 the two interfaces is that <b>pcre_compile2()</b> has an additional argument,
-<i>errorcodeptr</i>, via which a numerical error code can be returned.
+<i>errorcodeptr</i>, via which a numerical error code can be returned. To avoid
+too much repetition, we refer just to <b>pcre_compile()</b> below, but the
+information applies equally to <b>pcre_compile2()</b>.
 </P>
 <P>
 The pattern is a C string terminated by a binary zero, and is passed in the
@@ -420,14 +422,14 @@
 The <i>options</i> argument contains various bit settings that affect the
 compilation. It should be zero if no options are required. The available
 options are described below. Some of them (in particular, those that are
-compatible with Perl, but also some others) can also be set and unset from
+compatible with Perl, but some others as well) can also be set and unset from
 within the pattern (see the detailed description in the
 <a href="pcrepattern.html"><b>pcrepattern</b></a>
 documentation). For those options that can be different in different parts of
-the pattern, the contents of the <i>options</i> argument specifies their initial
-settings at the start of compilation and execution. The PCRE_ANCHORED and
-PCRE_NEWLINE_<i>xxx</i> options can be set at the time of matching as well as at
-compile time.
+the pattern, the contents of the <i>options</i> argument specifies their
+settings at the start of compilation and execution. The PCRE_ANCHORED,
+PCRE_BSR_<i>xxx</i>, and PCRE_NEWLINE_<i>xxx</i> options can be set at the time
+of matching as well as at compile time.
 </P>
 <P>
 If <i>errptr</i> is NULL, <b>pcre_compile()</b> returns NULL immediately.
@@ -435,7 +437,7 @@
 NULL, and sets the variable pointed to by <i>errptr</i> to point to a textual
 error message. This is a static string that is part of the library. You must
 not try to free it. The byte offset from the start of the pattern to the
-character that was being processes when the error was discovered is placed in
+character that was being processed when the error was discovered is placed in
 the variable pointed to by <i>erroffset</i>, which must not be NULL. If it is,
 an immediate error is given. Some errors are not detected until checks are
 carried out when the whole pattern has been scanned; in this case the offset is
@@ -772,17 +774,17 @@
 </P>
 <P>
 The returned value from <b>pcre_study()</b> can be passed directly to
-<b>pcre_exec()</b>. However, a <b>pcre_extra</b> block also contains other
-fields that can be set by the caller before the block is passed; these are
-described
+<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. However, a <b>pcre_extra</b> block
+also contains other fields that can be set by the caller before the block is
+passed; these are described
 <a href="#extradata">below</a>
 in the section on matching a pattern.
 </P>
 <P>
-If studying the pattern does not produce any additional information
+If studying the pattern does not produce any useful information,
 <b>pcre_study()</b> returns NULL. In that circumstance, if the calling program
-wants to pass any of the other fields to <b>pcre_exec()</b>, it must set up its
-own <b>pcre_extra</b> block.
+wants to pass any of the other fields to <b>pcre_exec()</b> or
+<b>pcre_dfa_exec()</b>, it must set up its own <b>pcre_extra</b> block.
 </P>
 <P>
 The second argument of <b>pcre_study()</b> contains option bits. At present, no
@@ -805,9 +807,19 @@
     0,              /* no options exist */
     &error);        /* set to NULL or points to a message */
 </pre>
-At present, studying a pattern is useful only for non-anchored patterns that do
-not have a single fixed starting character. A bitmap of possible starting
-bytes is created.
+Studying a pattern does two things: first, a lower bound for the length of
+subject string that is needed to match the pattern is computed. This does not
+mean that there are any strings of that length that match, but it does
+guarantee that no shorter strings match. The value is used by
+<b>pcre_exec()</b> and <b>pcre_dfa_exec()</b> to avoid wasting time by trying to
+match strings that are shorter than the lower bound. You can find out the value
+in a calling program via the <b>pcre_fullinfo()</b> function.
+</P>
+<P>
+Studying a pattern is also useful for non-anchored patterns that do not have a
+single fixed starting character. A bitmap of possible starting bytes is
+created. This speeds up finding a position in the subject at which to start
+matching.
 <a name="localesupport"></a></P>
 <br><a name="SEC10" href="#TOC1">LOCALE SUPPORT</a><br>
 <P>
@@ -978,6 +990,16 @@
 /^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value
 is -1.
 <pre>
+  PCRE_INFO_MINLENGTH
+</pre>
+If the pattern was studied and a minimum length for matching subject strings
+was computed, its value is returned. Otherwise the returned value is -1. The
+value is a number of characters, not bytes (this may be relevant in UTF-8
+mode). The fourth argument should point to an <b>int</b> variable. A
+non-negative value is a lower bound to the length of any matching string. There
+may not be any strings of that length that do actually match, but every string
+that does match is at least that long.
+<pre>
   PCRE_INFO_NAMECOUNT
   PCRE_INFO_NAMEENTRYSIZE
   PCRE_INFO_NAMETABLE
@@ -999,10 +1021,24 @@
 length of the longest name. PCRE_INFO_NAMETABLE returns a pointer to the first
 entry of the table (a pointer to <b>char</b>). The first two bytes of each entry
 are the number of the capturing parenthesis, most significant byte first. The
-rest of the entry is the corresponding name, zero terminated. The names are in
-alphabetical order. When PCRE_DUPNAMES is set, duplicate names are in order of
-their parentheses numbers. For example, consider the following pattern (assume
-PCRE_EXTENDED is set, so white space - including newlines - is ignored):
+rest of the entry is the corresponding name, zero terminated.
+</P>
+<P>
+The names are in alphabetical order. Duplicate names may appear if (?| is used
+to create multiple groups with the same number, as described in the
+<a href="pcrepattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
+in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+page. Duplicate names for subpatterns with different numbers are permitted only
+if PCRE_DUPNAMES is set. In all cases of duplicate names, they appear in the
+table in the order in which they were found in the pattern. In the absence of
+(?| this is the order of increasing number; when (?| is used this is not
+necessarily the case because later subpatterns may have lower numbers.
+</P>
+<P>
+As a simple example of the name/number table, consider the following pattern
+(assume PCRE_EXTENDED is set, so white space - including newlines - is
+ignored):
 <pre>
   (?&#60;date&#62; (?&#60;year&#62;(\d\d)?\d\d) - (?&#60;month&#62;\d\d) - (?&#60;day&#62;\d\d) )
 </pre>
@@ -1062,7 +1098,8 @@
 Return the size of the data block pointed to by the <i>study_data</i> field in
 a <b>pcre_extra</b> block. That is, it is the value that was passed to
 <b>pcre_malloc()</b> when PCRE was getting memory into which to place the data
-created by <b>pcre_study()</b>. The fourth argument should point to a
+created by <b>pcre_study()</b>. If <b>pcre_extra</b> is NULL, or there is no
+study data, zero is returned. The fourth argument should point to a
 <b>size_t</b> variable.
 </P>
 <br><a name="SEC12" href="#TOC1">OBSOLETE INFO FUNCTION</a><br>
@@ -1122,7 +1159,7 @@
 <P>
 The function <b>pcre_exec()</b> is called to match a subject string against a
 compiled pattern, which is passed in the <i>code</i> argument. If the
-pattern has been studied, the result of the study should be passed in the
+pattern was studied, the result of the study should be passed in the
 <i>extra</i> argument. This function is the main matching facility of the
 library, and it operates in a Perl-like manner. For specialist use there is
 also an alternative matching function, which is described
@@ -1189,7 +1226,7 @@
 The <i>match_limit</i> field provides a means of preventing PCRE from using up a
 vast amount of resources when running patterns that are not going to match,
 but which have a very large number of possibilities in their search trees. The
-classic example is the use of nested unlimited repeats.
+classic example is a pattern that uses nested unlimited repeats.
 </P>
 <P>
 Internally, PCRE uses a function called <b>match()</b> which it calls repeatedly
@@ -1339,7 +1376,7 @@
 <pre>
   PCRE_NOTEMPTY_ATSTART
 </pre>
-This is like PCRE_NOTEMPTY, except that an empty string match that is not at 
+This is like PCRE_NOTEMPTY, except that an empty string match that is not at
 the start of the subject is permitted. If the pattern is anchored, such a match
 can occur only if the pattern contains \K.
 </P>
@@ -1390,7 +1427,7 @@
 subject, or a value of <i>startoffset</i> that does not point to the start of a
 UTF-8 character, is undefined. Your program may crash.
 <pre>
-  PCRE_PARTIAL_HARD 
+  PCRE_PARTIAL_HARD
   PCRE_PARTIAL_SOFT
 </pre>
 These options turn on the partial matching feature. For backwards
@@ -1499,7 +1536,7 @@
 advisable to supply an <i>ovector</i>.
 </P>
 <P>
-The <b>pcre_info()</b> function can be used to find out how many capturing
+The <b>pcre_fullinfo()</b> function can be used to find out how many capturing
 subpatterns there are in a compiled pattern. The smallest size for
 <i>ovector</i> that will allow for <i>n</i> captured substrings, in addition to
 the offsets of the substring matched by the whole pattern, is (<i>n</i>+1)*3.
@@ -1605,7 +1642,7 @@
 </pre>
 This code is no longer in use. It was formerly returned when the PCRE_PARTIAL
 option was used with a compiled pattern containing items that were not
-supported for partial matching. From release 8.00 onwards, there are no 
+supported for partial matching. From release 8.00 onwards, there are no
 restrictions on partial matching.
 <pre>
   PCRE_ERROR_INTERNAL       (-14)
@@ -1779,10 +1816,15 @@
 the behaviour may not be what you want (see the next section).
 </P>
 <P>
-<b>Warning:</b> If the pattern uses the "(?|" feature to set up multiple
-subpatterns with the same number, you cannot use names to distinguish them,
-because names are not included in the compiled code. The matching process uses
-only numbers.
+<b>Warning:</b> If the pattern uses the (?| feature to set up multiple
+subpatterns with the same number, as described in the
+<a href="pcrepattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
+in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+page, you cannot use names to distinguish the different subpatterns, because
+names are not included in the compiled code. The matching process uses only
+numbers. For this reason, the use of different names for subpatterns of the
+same number causes an error at compile time.
 </P>
 <br><a name="SEC17" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
 <P>
@@ -1791,9 +1833,13 @@
 </P>
 <P>
 When a pattern is compiled with the PCRE_DUPNAMES option, names for subpatterns
-are not required to be unique. Normally, patterns with duplicate names are such
-that in any one match, only one of the named subpatterns participates. An
-example is shown in the
+are not required to be unique. (Duplicate names are always allowed for
+subpatterns with the same number, created by using the (?| feature. Indeed, if
+such subpatterns are named, they are required to use the same names.)
+</P>
+<P>
+Normally, patterns with duplicate names are such that in any one match, only
+one of the named subpatterns participates. An example is shown in the
 <a href="pcrepattern.html"><b>pcrepattern</b></a>
 documentation.
 </P>
@@ -1849,7 +1895,7 @@
 just once, and does not backtrack. This has different characteristics to the
 normal algorithm, and is not compatible with Perl. Some of the features of PCRE
 patterns are not supported. Nevertheless, there are times when this kind of
-matching can be useful. For a discussion of the two matching algorithms, and a 
+matching can be useful. For a discussion of the two matching algorithms, and a
 list of features that <b>pcre_dfa_exec()</b> does not support, see the
 <a href="pcrematching.html"><b>pcrematching</b></a>
 documentation.
@@ -1898,7 +1944,7 @@
 for <b>pcre_exec()</b>, so their description is not repeated here.
 <pre>
   PCRE_PARTIAL_HARD
-  PCRE_PARTIAL_SOFT 
+  PCRE_PARTIAL_SOFT
 </pre>
 These have the same general effect as they do for <b>pcre_exec()</b>, but the
 details are slightly different. When PCRE_PARTIAL_HARD is set for
@@ -2021,7 +2067,7 @@
 </P>
 <br><a name="SEC22" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 22 September 2009
+Last updated: 03 October 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrebuild.html
===================================================================
--- code/trunk/doc/html/pcrebuild.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcrebuild.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -40,12 +40,12 @@
 <b>configure</b> before running the <b>make</b> command. However, the same
 options can be selected in both Unix-like and non-Unix-like environments using
 the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead of
-<b>configure</b> to build PCRE. 
+<b>configure</b> to build PCRE.
 </P>
 <P>
-There is a lot more information about building PCRE in non-Unix-like 
-environments in the file called <i>NON_UNIX_USE</i>, which is part of the PCRE 
-distribution. You should consult this file as well as the <i>README</i> file if 
+There is a lot more information about building PCRE in non-Unix-like
+environments in the file called <i>NON_UNIX_USE</i>, which is part of the PCRE
+distribution. You should consult this file as well as the <i>README</i> file if
 you are building in a non-Unix-like environment.
 </P>
 <P>
@@ -80,7 +80,7 @@
 to the <b>configure</b> command. Of itself, this does not make PCRE treat
 strings as UTF-8. As well as compiling PCRE with this option, you also have
 have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
-function.
+or <b>pcre_compile2()</b> functions.
 </P>
 <P>
 If you set --enable-utf8 when compiling in an EBCDIC environment, PCRE expects
@@ -186,8 +186,8 @@
 metacharacter). By default, two-byte values are used for these offsets, leading
 to a maximum size for a compiled pattern of around 64K. This is sufficient to
 handle all but the most gigantic patterns. Nevertheless, some people do want to
-process enormous patterns, so it is possible to compile PCRE to use three-byte
-or four-byte offsets by adding a setting such as
+process truyl enormous patterns, so it is possible to compile PCRE to use
+three-byte or four-byte offsets by adding a setting such as
 <pre>
   --with-link-size=3
 </pre>
@@ -215,7 +215,7 @@
 <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
 management functions. By default these point to <b>malloc()</b> and
 <b>free()</b>, but you can replace the pointers so that your own functions are
-used.
+used instead.
 </P>
 <P>
 Separate functions are provided rather than using <b>pcre_malloc</b> and
@@ -224,7 +224,7 @@
 order. A calling program might be able to implement optimized functions that
 perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more
 slowly when built in this way. This option affects only the <b>pcre_exec()</b>
-function; it is not relevant for the the <b>pcre_dfa_exec()</b> function.
+function; it is not relevant for <b>pcre_dfa_exec()</b>.
 </P>
 <br><a name="SEC11" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
 <P>
@@ -308,7 +308,7 @@
 to the <b>configure</b> command, <b>pcretest</b> is linked with the
 <b>libreadline</b> library, and when its input is from a terminal, it reads it
 using the <b>readline()</b> function. This provides line-editing and history
-facilities. Note that <b>libreadline</b> is GPL-licenced, so if you distribute a
+facilities. Note that <b>libreadline</b> is GPL-licensed, so if you distribute a
 binary of <b>pcretest</b> linked in this way, there may be licensing issues.
 </P>
 <P>
@@ -345,7 +345,7 @@
 </P>
 <br><a name="SEC18" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 06 September 2009
+Last updated: 29 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrecallout.html
===================================================================
--- code/trunk/doc/html/pcrecallout.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcrecallout.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -39,9 +39,10 @@
 <pre>
   (?C1)abc(?C2)def
 </pre>
-If the PCRE_AUTO_CALLOUT option bit is set when <b>pcre_compile()</b> is called,
-PCRE automatically inserts callouts, all with number 255, before each item in
-the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
+If the PCRE_AUTO_CALLOUT option bit is set when <b>pcre_compile()</b> or
+<b>pcre_compile2()</b> is called, PCRE automatically inserts callouts, all with
+number 255, before each item in the pattern. For example, if PCRE_AUTO_CALLOUT
+is used with the pattern
 <pre>
   A(\d{2}|--)
 </pre>
@@ -73,6 +74,12 @@
 no match, the callout is obeyed.
 </P>
 <P>
+If the pattern is studied, PCRE knows the minimum length of a matching string,
+and will immediately give a "no match" return without actually running a match
+if the subject is not long enough, or, for unanchored patterns, if it has
+been scanned far enough.
+</P>
+<P>
 You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE
 option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. This slows down the
 matching process, but does ensure that callouts such as the example above are
@@ -179,7 +186,7 @@
 matching proceeds as normal. If the value is greater than zero, matching fails
 at the current point, but the testing of other matching possibilities goes
 ahead, just as if a lookahead assertion had failed. If the value is less than
-zero, the match is abandoned, and <b>pcre_exec()</b> (or <b>pcre_dfa_exec()</b>)
+zero, the match is abandoned, and <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
 returns the negative value.
 </P>
 <P>
@@ -199,7 +206,7 @@
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 15 March 2009
+Last updated: 29 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrecompat.html
===================================================================
--- code/trunk/doc/html/pcrecompat.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcrecompat.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -17,9 +17,8 @@
 </b><br>
 <P>
 This document describes the differences in the ways that PCRE and Perl handle
-regular expressions. The differences described here are mainly with respect to
-Perl 5.8, though PCRE versions 7.0 and later contain some features that are
-in Perl 5.10.
+regular expressions. The differences described here are with respect to Perl
+5.10.
 </P>
 <P>
 1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
@@ -90,11 +89,11 @@
 </P>
 <P>
 9. Subpatterns that are called recursively or as "subroutines" are always
-treated as atomic groups in PCRE. This is like Python, but unlike Perl. There 
+treated as atomic groups in PCRE. This is like Python, but unlike Perl. There
 is a discussion of an example that explains this in more detail in the
 <a href="pcrepattern.html#recursiondifference">section on recursion differences from Perl</a>
 in the
-<a href="pcrecompat.html"><b>pcrecompat</b></a>
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
 page.
 </P>
 <P>
@@ -108,15 +107,26 @@
 argument. PCRE does not support (*MARK).
 </P>
 <P>
-12. PCRE provides some extensions to the Perl regular expression facilities.
-Perl 5.10 will include new features that are not in earlier versions, some of
-which (such as named parentheses) have been in PCRE for some time. This list is
-with respect to Perl 5.10:
+12. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
+names is not as general as Perl's. This is a consequence of the fact the PCRE
+works internally just with numbers, using an external table to translate
+between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b)B),
+where the two capturing parentheses have the same number but different names,
+is not supported, and causes an error at compile time. If it were allowed, it
+would not be possible to distinguish which parentheses matched, because both
+names map to capturing subpattern number 1. To avoid this confusing situation,
+an error is given at compile time.
+</P>
+<P>
+13. PCRE provides some extensions to the Perl regular expression facilities.
+Perl 5.10 includes new features that are not in earlier versions of Perl, some
+of which (such as named parentheses) have been in PCRE for some time. This list
+is with respect to Perl 5.10:
 <br>
 <br>
-(a) Although lookbehind assertions must match fixed length strings, each
-alternative branch of a lookbehind assertion can match a different length of
-string. Perl requires them all to have the same length.
+(a) Although lookbehind assertions in PCRE must match fixed length strings,
+each alternative branch of a lookbehind assertion can match a different length
+of string. Perl requires them all to have the same length.
 <br>
 <br>
 (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
@@ -177,7 +187,7 @@
 REVISION
 </b><br>
 <P>
-Last updated: 18 September 2009
+Last updated: 04 October 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcregrep.html
===================================================================
--- code/trunk/doc/html/pcregrep.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcregrep.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -119,9 +119,9 @@
 </P>
 <br><a name="SEC4" href="#TOC1">OPTIONS</a><br>
 <P>
-The order in which some of the options appear can affect the output. For 
-example, both the <b>-h</b> and <b>-l</b> options affect the printing of file 
-names. Whichever comes later in the command line will be the one that takes 
+The order in which some of the options appear can affect the output. For
+example, both the <b>-h</b> and <b>-l</b> options affect the printing of file
+names. Whichever comes later in the command line will be the one that takes
 effect.
 </P>
 <P>
@@ -326,9 +326,9 @@
 Instead of outputting lines from the files, just output the names of the files
 containing lines that would have been output. Each file name is output
 once, on a separate line. Searching normally stops as soon as a matching line
-is found in a file. However, if the <b>-c</b> (count) option is also used, 
-matching continues in order to obtain the correct count, and those files that 
-have at least one match are listed along with their counts. Using this option 
+is found in a file. However, if the <b>-c</b> (count) option is also used,
+matching continues in order to obtain the correct count, and those files that
+have at least one match are listed along with their counts. Using this option
 with <b>-c</b> is a way of suppressing the listing of files with no matches.
 </P>
 <P>
@@ -474,8 +474,8 @@
 as in the GNU <b>grep</b> program. Any long option of the form
 <b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
 (PCRE terminology). However, the <b>--locale</b>, <b>-M</b>, <b>--multiline</b>,
-<b>-u</b>, and <b>--utf-8</b> options are specific to <b>pcregrep</b>. If both the 
-<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names, 
+<b>-u</b>, and <b>--utf-8</b> options are specific to <b>pcregrep</b>. If both the
+<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
 without counts, but <b>pcregrep</b> gives the counts.
 </P>
 <br><a name="SEC8" href="#TOC1">OPTIONS WITH DATA</a><br>


Modified: code/trunk/doc/html/pcrematching.html
===================================================================
--- code/trunk/doc/html/pcrematching.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcrematching.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -96,13 +96,18 @@
 simultaneously).
 </P>
 <P>
+Although the general principle of this matching algorithm is that it scans the
+subject string only once, without backtracking, there is one exception: when a
+lookaround assertion is encountered, the characters following or preceding the
+current point have to be independently inspected.
+</P>
+<P>
 The scan continues until either the end of the subject is reached, or there are
 no more unterminated paths. At this point, terminated paths represent the
 different matching possibilities (if there are none, the match has failed).
 Thus, if there is more than one possible match, this algorithm finds all of
-them, and in particular, it finds the longest. In PCRE, there is an option to
-stop the algorithm after the first match (which is necessarily the shortest)
-has been found.
+them, and in particular, it finds the longest. There is an option to stop the
+algorithm after the first match (which is necessarily the shortest) is found.
 </P>
 <P>
 Note that all the matches that are found start at the same point in the
@@ -116,12 +121,6 @@
 matches that start at later positions.
 </P>
 <P>
-Although the general principle of this matching algorithm is that it scans the 
-subject string only once, without backtracking, there is one exception: when a 
-lookbehind assertion is encountered, the preceding characters have to be
-re-inspected.
-</P>
-<P>
 There are a number of features of PCRE regular expressions that are not
 supported by the alternative matching algorithm. They are as follows:
 </P>
@@ -186,7 +185,9 @@
 2. Because the alternative algorithm scans the subject string just once, and
 never needs to backtrack, it is possible to pass very long subject strings to
 the matching function in several pieces, checking for partial matching each
-time.
+time. The
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+documentation gives details of partial matching.
 </P>
 <br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
 <P>
@@ -215,7 +216,7 @@
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 05 September 2009
+Last updated: 29 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrepartial.html
===================================================================
--- code/trunk/doc/html/pcrepartial.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcrepartial.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -58,10 +58,13 @@
 are set, PCRE_PARTIAL_HARD takes precedence.
 </P>
 <P>
-Setting a partial matching option disables one of PCRE's optimizations. PCRE
+Setting a partial matching option disables two of PCRE's optimizations. PCRE
 remembers the last literal byte in a pattern, and abandons matching immediately
 if such a byte is not present in the subject string. This optimization cannot
-be used for a subject string that might match only partially.
+be used for a subject string that might match only partially. If the pattern
+was studied, PCRE knows the minimum length of a matching string, and does not
+bother to run the matching function on shorter strings. This optimization is
+also disabled for partial matching.
 </P>
 <br><a name="SEC2" href="#TOC1">PARTIAL MATCHING USING pcre_exec()</a><br>
 <P>
@@ -78,7 +81,7 @@
 vector, the first of them is set to the offset of the earliest character that
 was inspected when the partial match was found. For convenience, the second
 offset points to the end of the string so that a substring can easily be
-extracted.
+identified.
 </P>
 <P>
 For the majority of patterns, the first offset identifies the start of the
@@ -382,7 +385,7 @@
 </P>
 <br><a name="SEC11" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 05 September 2009
+Last updated: 29 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrepattern.html
===================================================================
--- code/trunk/doc/html/pcrepattern.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcrepattern.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -61,10 +61,10 @@
 </P>
 <P>
 The original operation of PCRE was on strings of one-byte characters. However,
-there is now also support for UTF-8 character strings. To use this, you must
-build PCRE to include UTF-8 support, and then call <b>pcre_compile()</b> with
-the PCRE_UTF8 option. There is also a special sequence that can be given at the
-start of a pattern:
+there is now also support for UTF-8 character strings. To use this,
+PCRE must be built to include UTF-8 support, and you must call
+<b>pcre_compile()</b> or <b>pcre_compile2()</b> with the PCRE_UTF8 option. There
+is also a special sequence that can be given at the start of a pattern:
 <pre>
   (*UTF8)
 </pre>
@@ -111,8 +111,9 @@
   (*ANYCRLF)   any of the three above
   (*ANY)       all Unicode newline sequences
 </pre>
-These override the default and the options given to <b>pcre_compile()</b>. For
-example, on a Unix system where LF is the default newline sequence, the pattern
+These override the default and the options given to <b>pcre_compile()</b> or
+<b>pcre_compile2()</b>. For example, on a Unix system where LF is the default
+newline sequence, the pattern
 <pre>
   (*CR)a.b
 </pre>
@@ -228,9 +229,8 @@
 A second use of backslash provides a way of encoding non-printing characters
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters, apart from the binary zero that terminates a pattern,
-but when a pattern is being prepared by text editing, it is usually easier to
-use one of the following escape sequences than the binary character it
-represents:
+but when a pattern is being prepared by text editing, it is often easier to use
+one of the following escape sequences than the binary character it represents:
 <pre>
   \a        alarm, that is, the BEL character (hex 07)
   \cx       "control-x", where x is any character
@@ -334,7 +334,7 @@
 syntax for referencing a subpattern as a "subroutine". Details are discussed
 <a href="#onigurumasubroutines">later.</a>
 Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
-synonymous. The former is a back reference; the latter is a 
+synonymous. The former is a back reference; the latter is a
 <a href="#subpatternsassubroutines">subroutine</a>
 call.
 </P>
@@ -465,12 +465,13 @@
   (*BSR_ANYCRLF)   CR, LF, or CRLF only
   (*BSR_UNICODE)   any Unicode newline sequence
 </pre>
-These override the default and the options given to <b>pcre_compile()</b>, but
-they can be overridden by options given to <b>pcre_exec()</b>. Note that these
-special settings, which are not Perl-compatible, are recognized only at the
-very start of a pattern, and that they must be in upper case. If more than one
-of them is present, the last one is used. They can be combined with a change of
-newline convention, for example, a pattern can start with:
+These override the default and the options given to <b>pcre_compile()</b> or
+<b>pcre_compile2()</b>, but they can be overridden by options given to
+<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. Note that these special settings,
+which are not Perl-compatible, are recognized only at the very start of a
+pattern, and that they must be in upper case. If more than one of them is
+present, the last one is used. They can be combined with a change of newline
+convention, for example, a pattern can start with:
 <pre>
   (*ANY)(*BSR_ANYCRLF)
 </pre>
@@ -731,7 +732,10 @@
 A word boundary is a position in the subject string where the current character
 and the previous character do not both match \w or \W (i.e. one matches
 \w and the other matches \W), or the start or end of the string if the
-first or last character matches \w, respectively.
+first or last character matches \w, respectively. Neither PCRE nor Perl has a
+separte "start of word" or "end of word" metasequence. However, whatever
+follows \b normally determines which it is. For example, the fragment
+\ba matches "a" at the start of a word.
 </P>
 <P>
 The \A, \Z, and \z assertions differ from the traditional circumflex and
@@ -862,15 +866,16 @@
 <br><a name="SEC8" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>
 <P>
 An opening square bracket introduces a character class, terminated by a closing
-square bracket. A closing square bracket on its own is not special. If a
-closing square bracket is required as a member of the class, it should be the
-first data character in the class (after an initial circumflex, if present) or
-escaped with a backslash.
+square bracket. A closing square bracket on its own is not special by default.
+However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square
+bracket causes a compile-time error. If a closing square bracket is required as
+a member of the class, it should be the first data character in the class
+(after an initial circumflex, if present) or escaped with a backslash.
 </P>
 <P>
 A character class matches a single character in the subject. In UTF-8 mode, the
-character may occupy more than one byte. A matched character must be in the set
-of characters defined by the class, unless the first character in the class
+character may be more than one byte long. A matched character must be in the
+set of characters defined by the class, unless the first character in the class
 definition is a circumflex, in which case the subject character must not be in
 the set defined by the class. If a circumflex is actually required as a member
 of the class, ensure it is not the first character, or escape it with a
@@ -881,7 +886,7 @@
 [^aeiou] matches any character that is not a lower case vowel. Note that a
 circumflex is just a convenient notation for specifying the characters that
 are in the class by enumerating those that are not. A class that starts with a
-circumflex is not an assertion: it still consumes a character from the subject
+circumflex is not an assertion; it still consumes a character from the subject
 string, and therefore it fails if the current pointer is at the end of the
 string.
 </P>
@@ -897,9 +902,9 @@
 case for characters whose values are less than 128, so caseless matching is
 always possible. For characters with higher values, the concept of case is
 supported if PCRE is compiled with Unicode property support, but not otherwise.
-If you want to use caseless matching for characters 128 and above, you must
-ensure that PCRE is compiled with Unicode property support as well as with
-UTF-8 support.
+If you want to use caseless matching in UTF8-mode for characters 128 and above,
+you must ensure that PCRE is compiled with Unicode property support as well as
+with UTF-8 support.
 </P>
 <P>
 Characters that might indicate line breaks are never treated in any special way
@@ -1127,7 +1132,7 @@
 from left to right, and options are not reset until the end of the subpattern
 is reached, an option setting in one branch does affect subsequent branches, so
 the above patterns match "SUNDAY" as well as "Saturday".
-</P>
+<a name="dupsubpatternnumber"></a></P>
 <br><a name="SEC13" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>
 <P>
 Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
@@ -1152,8 +1157,22 @@
   / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
   # 1            2         2  3        2     3     4
 </pre>
-A backreference or a recursive call to a numbered subpattern always refers to
-the first one in the pattern with the given number.
+A backreference to a numbered subpattern uses the most recent value that is set
+for that number by any subpattern. The following pattern matches "abcabc" or
+"defdef":
+<pre>
+  /(?|(abc)|(def))\1/
+</pre>
+In contrast, a recursive or "subroutine" call to a numbered subpattern always
+refers to the first one in the pattern with the given number. The following
+pattern matches "abcabc" or "defabc":
+<pre>
+  /(?|(abc)|(def))(?1)/
+</pre>
+If a
+<a href="#conditions">condition test</a>
+for a subpattern's having matched refers to a non-unique number, the test is
+true if any of the subpatterns of that number have matched.
 </P>
 <P>
 An alternative approach to using this "branch reset" feature is to use
@@ -1167,7 +1186,8 @@
 difficulty, PCRE supports the naming of subpatterns. This feature was not
 added to Perl until release 5.10. Python had the feature earlier, and PCRE
 introduced it at release 4.0, using the Python syntax. PCRE now supports both
-the Perl and the Python syntax.
+the Perl and the Python syntax. Perl allows identically numbered subpatterns to
+have different names, but PCRE does not.
 </P>
 <P>
 In PCRE, a subpattern can be named in one of three ways: (?&#60;name&#62;...) or
@@ -1188,11 +1208,13 @@
 </P>
 <P>
 By default, a name must be unique within a pattern, but it is possible to relax
-this constraint by setting the PCRE_DUPNAMES option at compile time. This can
-be useful for patterns where only one instance of the named parentheses can
-match. Suppose you want to match the name of a weekday, either as a 3-letter
-abbreviation or as the full name, and in both cases you want to extract the
-abbreviation. This pattern (ignoring the line breaks) does the job:
+this constraint by setting the PCRE_DUPNAMES option at compile time. (Duplicate
+names are also always permitted for subpatterns with the same number, set up as
+described in the previous section.) Duplicate names can be useful for patterns
+where only one instance of the named parentheses can match. Suppose you want to
+match the name of a weekday, either as a 3-letter abbreviation or as the full
+name, and in both cases you want to extract the abbreviation. This pattern
+(ignoring the line breaks) does the job:
 <pre>
   (?&#60;DN&#62;Mon|Fri|Sun)(?:day)?|
   (?&#60;DN&#62;Tue)(?:sday)?|
@@ -1207,17 +1229,29 @@
 <P>
 The convenience function for extracting the data by name returns the substring
 for the first (and in this example, the only) subpattern of that name that
-matched. This saves searching to find which numbered subpattern it was. If you
-make a reference to a non-unique named subpattern from elsewhere in the
-pattern, the one that corresponds to the lowest number is used. For further
-details of the interfaces for handling named subpatterns, see the
+matched. This saves searching to find which numbered subpattern it was.
+</P>
+<P>
+If you make a backreference to a non-unique named subpattern from elsewhere in
+the pattern, the one that corresponds to the first occurrence of the name is
+used. In the absence of duplicate numbers (see the previous section) this is
+the one with the lowest number. If you use a named reference in a condition
+test (see the
+<a href="#conditions">section about conditions</a>
+below), either to check whether a subpattern has matched, or to check for
+recursion, all subpatterns with the same name are tested. If the condition is
+true for any one of them, the overall condition is true. This is the same
+behaviour as testing by number. For further details of the interfaces for
+handling named subpatterns, see the
 <a href="pcreapi.html"><b>pcreapi</b></a>
 documentation.
 </P>
 <P>
 <b>Warning:</b> You cannot use different names to distinguish between two
-subpatterns with the same number (see the previous section) because PCRE uses
-only the numbers when matching.
+subpatterns with the same number because PCRE uses only the numbers when
+matching. For this reason, an error is given at compile time if different names
+are given to subpatterns with the same number. However, you can give the same
+name to subpatterns with the same number, even when PCRE_DUPNAMES is not set.
 </P>
 <br><a name="SEC15" href="#TOC1">REPETITION</a><br>
 <P>
@@ -1233,6 +1267,7 @@
   a character class
   a back reference (see next section)
   a parenthesized subpattern (unless it is an assertion)
+  a recursive or "subroutine" call to a subpattern
 </pre>
 The general repetition quantifier specifies a minimum and maximum number of
 permitted matches, by giving the two numbers in curly brackets (braces),
@@ -1564,16 +1599,20 @@
 <P>
 There may be more than one back reference to the same subpattern. If a
 subpattern has not actually been used in a particular match, any back
-references to it always fail. For example, the pattern
+references to it always fail by default. For example, the pattern
 <pre>
   (a|(bc))\2
 </pre>
-always fails if it starts to match "a" rather than "bc". Because there may be
-many capturing parentheses in a pattern, all digits following the backslash are
-taken as part of a potential back reference number. If the pattern continues
-with a digit character, some delimiter must be used to terminate the back
-reference. If the PCRE_EXTENDED option is set, this can be whitespace.
-Otherwise an empty comment (see
+always fails if it starts to match "a" rather than "bc". However, if the
+PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back reference to an
+unset value matches an empty string.
+</P>
+<P>
+Because there may be many capturing parentheses in a pattern, all digits
+following a backslash are taken as part of a potential back reference number.
+If the pattern continues with a digit character, some delimiter must be used to
+terminate the back reference. If the PCRE_EXTENDED option is set, this can be
+whitespace. Otherwise, the \g{ syntax or an empty comment (see
 <a href="#comments">"Comments"</a>
 below) can be used.
 </P>
@@ -1641,6 +1680,8 @@
 If you want to force a matching failure at some point in a pattern, the most
 convenient way to do it is with (?!) because an empty string always matches, so
 an assertion that requires there not to be an empty string must always fail.
+The Perl 5.10 backtracking control verb (*FAIL) or (*F) is essentially a
+synonym for (?!).
 <a name="lookbehind"></a></P>
 <br><b>
 Lookbehind assertions
@@ -1677,7 +1718,7 @@
 </pre>
 In some cases, the Perl 5.10 escape sequence \K
 <a href="#resetmatchstart">(see above)</a>
-can be used instead of a lookbehind assertion to get round the fixed-length 
+can be used instead of a lookbehind assertion to get round the fixed-length
 restriction.
 </P>
 <P>
@@ -1695,14 +1736,14 @@
 <P>
 <a href="#subpatternsassubroutines">"Subroutine"</a>
 calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
-as the subpattern matches a fixed-length string. 
+as the subpattern matches a fixed-length string.
 <a href="#recursion">Recursion,</a>
 however, is not supported.
 </P>
 <P>
 Possessive quantifiers can be used in conjunction with lookbehind assertions to
-specify efficient matching at the end of the subject string. Consider a simple
-pattern such as
+specify efficient matching of fixed-length strings at the end of subject
+strings. Consider a simple pattern such as
 <pre>
   abcd$
 </pre>
@@ -1764,8 +1805,8 @@
 <P>
 It is possible to cause the matching process to obey a subpattern
 conditionally or to choose between two alternative subpatterns, depending on
-the result of an assertion, or whether a previous capturing subpattern matched
-or not. The two possible forms of conditional subpattern are
+the result of an assertion, or whether a specific capturing subpattern has
+already been matched. The two possible forms of conditional subpattern are:
 <pre>
   (?(condition)yes-pattern)
   (?(condition)yes-pattern|no-pattern)
@@ -1783,12 +1824,16 @@
 </b><br>
 <P>
 If the text between the parentheses consists of a sequence of digits, the
-condition is true if the capturing subpattern of that number has previously
-matched. An alternative notation is to precede the digits with a plus or minus
-sign. In this case, the subpattern number is relative rather than absolute.
-The most recently opened parentheses can be referenced by (?(-1), the next most
-recent by (?(-2), and so on. In looping constructs it can also make sense to
-refer to subsequent groups with constructs such as (?(+2).
+condition is true if a capturing subpattern of that number has previously
+matched. If there is more than one capturing subpattern with the same number
+(see the earlier
+<a href="#recursion">section about duplicate subpattern numbers),</a>
+the condition is true if any of them have been set. An alternative notation is
+to precede the digits with a plus or minus sign. In this case, the subpattern
+number is relative rather than absolute. The most recently opened parentheses
+can be referenced by (?(-1), the next most recent by (?(-2), and so on. In
+looping constructs it can also make sense to refer to subsequent groups with
+constructs such as (?(+2).
 </P>
 <P>
 Consider the following pattern, which contains non-significant white space to
@@ -1832,8 +1877,10 @@
 Rewriting the above example to use a named subpattern gives this:
 <pre>
   (?&#60;OPEN&#62; \( )?    [^()]+    (?(&#60;OPEN&#62;) \) )
-
-</PRE>
+</pre>
+If the name used in a condition of this kind is a duplicate, the test is
+applied to all subpatterns of the same name, and is true if any one of them has
+matched.
 </P>
 <br><b>
 Checking for pattern recursion
@@ -1846,14 +1893,16 @@
 <pre>
   (?(R3)...) or (?(R&name)...)
 </pre>
-the condition is true if the most recent recursion is into the subpattern whose
+the condition is true if the most recent recursion is into a subpattern whose
 number or name is given. This condition does not check the entire recursion
-stack.
+stack. If the name used in a condition of this kind is a duplicate, the test is
+applied to all subpatterns of the same name, and is true if any one of them is
+the most recent recursion.
 </P>
 <P>
-At "top level", all these recursion test conditions are false. 
-<a href="#recursion">Recursive patterns</a>
-are described below.
+At "top level", all these recursion test conditions are false.
+<a href="#recursion">The syntax for recursive patterns</a>
+is described below.
 </P>
 <br><b>
 Defining subpatterns for use by reference only
@@ -1863,7 +1912,7 @@
 name DEFINE, the condition is always false. In this case, there may be only one
 alternative in the subpattern. It is always skipped if control reaches this
 point in the pattern; the idea of DEFINE is that it can be used to define
-"subroutines" that can be referenced from elsewhere. (The use of 
+"subroutines" that can be referenced from elsewhere. (The use of
 <a href="#subpatternsassubroutines">"subroutines"</a>
 is described below.) For example, a pattern to match an IPv4 address could be
 written like this (ignore whitespace and line breaks):
@@ -1874,13 +1923,10 @@
 The first part of the pattern is a DEFINE group inside which a another group
 named "byte" is defined. This matches an individual component of an IPv4
 address (a number less than 256). When matching takes place, this part of the
-pattern is skipped because DEFINE acts like a false condition.
+pattern is skipped because DEFINE acts like a false condition. The rest of the
+pattern uses references to the named group to match the four dot-separated
+components of an IPv4 address, insisting on a word boundary at each end.
 </P>
-<P>
-The rest of the pattern uses references to the named group to match the four
-dot-separated components of an IPv4 address, insisting on a word boundary at
-each end.
-</P>
 <br><b>
 Assertion conditions
 </b><br>
@@ -1939,7 +1985,7 @@
 <P>
 A special item that consists of (? followed by a number greater than zero and a
 closing parenthesis is a recursive call of the subpattern of the given number,
-provided that it occurs inside that subpattern. (If not, it is a 
+provided that it occurs inside that subpattern. (If not, it is a
 <a href="#subpatternsassubroutines">"subroutine"</a>
 call, which is described in the next section.) The special item (?R) or (?0) is
 a recursive call of the entire regular expression.
@@ -1948,25 +1994,26 @@
 This PCRE pattern solves the nested parentheses problem (assume the
 PCRE_EXTENDED option is set so that white space is ignored):
 <pre>
-  \( ( (?&#62;[^()]+) | (?R) )* \)
+  \( ( [^()]++ | (?R) )* \)
 </pre>
 First it matches an opening parenthesis. Then it matches any number of
 substrings which can either be a sequence of non-parentheses, or a recursive
 match of the pattern itself (that is, a correctly parenthesized substring).
-Finally there is a closing parenthesis.
+Finally there is a closing parenthesis. Note the use of a possessive quantifier
+to avoid backtracking into sequences of non-parentheses.
 </P>
 <P>
 If this were part of a larger pattern, you would not want to recurse the entire
 pattern, so instead you could use this:
 <pre>
-  ( \( ( (?&#62;[^()]+) | (?1) )* \) )
+  ( \( ( [^()]++ | (?1) )* \) )
 </pre>
 We have put the pattern into parentheses, and caused the recursion to refer to
 them instead of the whole pattern.
 </P>
 <P>
 In a larger pattern, keeping track of parenthesis numbers can be tricky. This
-is made easier by the use of relative references. (A Perl 5.10 feature.)
+is made easier by the use of relative references (a Perl 5.10 feature).
 Instead of (?1) in the pattern above you can write (?-2) to refer to the second
 most recently opened parentheses preceding the recursion. In other words, a
 negative number counts capturing parentheses leftwards from the point at which
@@ -1984,20 +2031,20 @@
 for this is (?&name); PCRE's earlier syntax (?P&#62;name) is also supported. We
 could rewrite the above example as follows:
 <pre>
-  (?&#60;pn&#62; \( ( (?&#62;[^()]+) | (?&pn) )* \) )
+  (?&#60;pn&#62; \( ( [^()]++ | (?&pn) )* \) )
 </pre>
 If there is more than one subpattern with the same name, the earliest one is
 used.
 </P>
 <P>
 This particular example pattern that we have been looking at contains nested
-unlimited repeats, and so the use of atomic grouping for matching strings of
-non-parentheses is important when applying the pattern to strings that do not
-match. For example, when this pattern is applied to
+unlimited repeats, and so the use of a possessive quantifier for matching
+strings of non-parentheses is important when applying the pattern to strings
+that do not match. For example, when this pattern is applied to
 <pre>
   (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
 </pre>
-it yields "no match" quickly. However, if atomic grouping is not used,
+it yields "no match" quickly. However, if a possessive quantifier is not used,
 the match runs for a very long time indeed because there are so many different
 ways the + and * repeats can carve up the subject, and all have to be tested
 before failure can be reported.
@@ -2015,7 +2062,7 @@
 the value for the capturing parentheses is "ef", which is the last value taken
 on at the top level. If additional parentheses are added, giving
 <pre>
-  \( ( ( (?&#62;[^()]+) | (?R) )* ) \)
+  \( ( ( [^()]++ | (?R) )* ) \)
      ^                        ^
      ^                        ^
 </pre>
@@ -2044,19 +2091,19 @@
 In PCRE (like Python, but unlike Perl), a recursive subpattern call is always
 treated as an atomic group. That is, once it has matched some of the subject
 string, it is never re-entered, even if it contains untried alternatives and
-there is a subsequent matching failure. This can be illustrated by the 
-following pattern, which purports to match a palindromic string that contains 
+there is a subsequent matching failure. This can be illustrated by the
+following pattern, which purports to match a palindromic string that contains
 an odd number of characters (for example, "a", "aba", "abcba", "abcdcba"):
 <pre>
   ^(.|(.)(?1)\2)$
 </pre>
-The idea is that it either matches a single character, or two identical 
-characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE 
+The idea is that it either matches a single character, or two identical
+characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE
 it does not if the pattern is longer than three characters. Consider the
 subject string "abcba":
 </P>
 <P>
-At the top level, the first character is matched, but as it is not at the end 
+At the top level, the first character is matched, but as it is not at the end
 of the string, the first alternative fails; the second alternative is taken
 and the recursion kicks in. The recursive call to subpattern 1 successfully
 matches the next character ("b"). (Note that the beginning and end of line
@@ -2064,7 +2111,7 @@
 </P>
 <P>
 Back at the top level, the next character ("c") is compared with what
-subpattern 2 matched, which was "a". This fails. Because the recursion is 
+subpattern 2 matched, which was "a". This fails. Because the recursion is
 treated as an atomic group, there are now no backtracking points, and so the
 entire match fails. (Perl is able, at this point, to re-enter the recursion and
 try the second alternative.) However, if the pattern is written with the
@@ -2072,36 +2119,44 @@
 <pre>
   ^((.)(?1)\2|.)$
 </pre>
-This time, the recursing alternative is tried first, and continues to recurse 
-until it runs out of characters, at which point the recursion fails. But this 
-time we do have another alternative to try at the higher level. That is the big 
+This time, the recursing alternative is tried first, and continues to recurse
+until it runs out of characters, at which point the recursion fails. But this
+time we do have another alternative to try at the higher level. That is the big
 difference: in the previous case the remaining alternative is at a deeper
 recursion level, which PCRE cannot use.
 </P>
 <P>
-To change the pattern so that matches all palindromic strings, not just those 
+To change the pattern so that matches all palindromic strings, not just those
 with an odd number of characters, it is tempting to change the pattern to this:
 <pre>
   ^((.)(?1)\2|.?)$
 </pre>
-Again, this works in Perl, but not in PCRE, and for the same reason. When a 
-deeper recursion has matched a single character, it cannot be entered again in 
-order to match an empty string. The solution is to separate the two cases, and 
+Again, this works in Perl, but not in PCRE, and for the same reason. When a
+deeper recursion has matched a single character, it cannot be entered again in
+order to match an empty string. The solution is to separate the two cases, and
 write out the odd and even cases as alternatives at the higher level:
 <pre>
   ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
 </pre>
-If you want to match typical palindromic phrases, the pattern has to ignore all 
+If you want to match typical palindromic phrases, the pattern has to ignore all
 non-word characters, which can be done like this:
 <pre>
   ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
 </pre>
-If run with the PCRE_CASELESS option, this pattern matches phrases such as "A 
-man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note 
-the use of the possessive quantifier *+ to avoid backtracking into sequences of 
+If run with the PCRE_CASELESS option, this pattern matches phrases such as "A
+man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note
+the use of the possessive quantifier *+ to avoid backtracking into sequences of
 non-word characters. Without this, PCRE takes a great deal longer (ten times or
 more) to match typical phrases, and Perl takes so long that you think it has
 gone into a loop.
+</P>
+<P>
+<b>WARNING</b>: The palindrome-matching patterns above work only if the subject
+string does not start with a palindrome that is shorter than the entire string.
+For example, although "abcba" is correctly matched, if the subject is "ababa",
+PCRE finds the palindrome "aba" at the start, then fails at top level because
+the end of the string does not follow. Once again, it cannot jump back into the
+recursion to try other alternatives, so the entire match fails.
 <a name="subpatternsassubroutines"></a></P>
 <br><a name="SEC22" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
 <P>
@@ -2212,9 +2267,9 @@
 <b>pcre_dfa_exec()</b>.
 </P>
 <P>
-If any of these verbs are used in an assertion subpattern, their effect is 
+If any of these verbs are used in an assertion subpattern, their effect is
 confined to that subpattern; it does not extend to the surrounding pattern.
-Note that assertion subpatterns are processed as anchored at the point where 
+Note that assertion subpatterns are processed as anchored at the point where
 they are tested.
 </P>
 <P>
@@ -2234,12 +2289,12 @@
 </pre>
 This verb causes the match to end successfully, skipping the remainder of the
 pattern. When inside a recursion, only the innermost pattern is ended
-immediately. If the (*ACCEPT) is inside capturing parentheses, the data so far
-is captured. (This feature was added to PCRE at release 8.00.) For example:
+immediately. If (*ACCEPT) is inside capturing parentheses, the data so far is
+captured. (This feature was added to PCRE at release 8.00.) For example:
 <pre>
   A((?:A|B(*ACCEPT)|C)D)
 </pre>
-This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by 
+This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
 the outer parentheses.
 <pre>
   (*FAIL) or (*F)
@@ -2267,7 +2322,7 @@
 </pre>
 This verb causes the whole match to fail outright if the rest of the pattern
 does not match. Even if the pattern is unanchored, no further attempts to find
-a match by advancing the start point take place. Once (*COMMIT) has been
+a match by advancing the starting point take place. Once (*COMMIT) has been
 passed, <b>pcre_exec()</b> is committed to finding a match at the current
 starting point, or not at all. For example:
 <pre>
@@ -2299,7 +2354,7 @@
 If the subject is "aaaac...", after the first match attempt fails (starting at
 the first character in the string), the starting point skips on to start the
 next attempt at "c". Note that a possessive quantifer does not have the same
-effect in this example; although it would suppress backtracking during the
+effect as this example; although it would suppress backtracking during the
 first match attempt, the second attempt would start at the second character
 instead of skipping on to "c".
 <pre>
@@ -2319,7 +2374,8 @@
 </P>
 <br><a name="SEC26" href="#TOC1">SEE ALSO</a><br>
 <P>
-<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3).
+<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3),
+<b>pcresyntax</b>(3), <b>pcre</b>(3).
 </P>
 <br><a name="SEC27" href="#TOC1">AUTHOR</a><br>
 <P>
@@ -2332,7 +2388,7 @@
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 22 September 2009
+Last updated: 04 October 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcreposix.html
===================================================================
--- code/trunk/doc/html/pcreposix.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcreposix.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -128,9 +128,9 @@
 <pre>
   REG_UNGREEDY
 </pre>
-The PCRE_UNGREEDY option is set when the regular expression is passed for 
+The PCRE_UNGREEDY option is set when the regular expression is passed for
 compilation to the native function. Note that REG_UNGREEDY is not part of the
-POSIX standard.   
+POSIX standard.
 <pre>
   REG_UTF8
 </pre>


Modified: code/trunk/doc/html/pcresample.html
===================================================================
--- code/trunk/doc/html/pcresample.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcresample.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -20,7 +20,7 @@
 is supplied in the file <i>pcredemo.c</i> in the PCRE distribution. A listing of
 this program is given in the
 <a href="pcredemo.html"><b>pcredemo</b></a>
-documentation. If you do not have a copy of the PCRE distribution, you can save 
+documentation. If you do not have a copy of the PCRE distribution, you can save
 this listing to re-create <i>pcredemo.c</i>.
 </P>
 <P>
@@ -38,8 +38,8 @@
 </P>
 <P>
 If PCRE is installed in the standard include and library directories for your
-system, you should be able to compile the demonstration program using this
-command:
+operating system, you should be able to compile the demonstration program using
+this command:
 <pre>
   gcc -o pcredemo pcredemo.c -lpcre
 </pre>
@@ -59,7 +59,7 @@
 Note that there is a much more comprehensive test program, called
 <a href="pcretest.html"><b>pcretest</b>,</a>
 which supports many more facilities for testing regular expressions and the
-PCRE library. The 
+PCRE library. The
 <a href="pcredemo.html"><b>pcredemo</b></a>
 program is provided as a simple coding example.
 </P>
@@ -93,7 +93,7 @@
 REVISION
 </b><br>
 <P>
-Last updated: 01 September 2009
+Last updated: 30 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcretest.html
===================================================================
--- code/trunk/doc/html/pcretest.html    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/html/pcretest.html    2009-10-05 10:59:35 UTC (rev 461)
@@ -248,7 +248,7 @@
 If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an
 empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and
 PCRE_ANCHORED flags set in order to search for another, non-empty, match at the
-same point. If this second match fails, the start offset is advanced by one 
+same point. If this second match fails, the start offset is advanced by one
 character, and the normal match is retried. This imitates the way Perl handles
 such cases when using the <b>/g</b> modifier or the <b>split()</b> function.
 </P>
@@ -371,13 +371,14 @@
   \L         call pcre_get_substringlist() after a successful match
   \M         discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
   \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>; if used twice, pass the
-               PCRE_NOTEMPTY_ATSTART option 
+               PCRE_NOTEMPTY_ATSTART option
   \Odd       set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
   \P         pass the PCRE_PARTIAL_SOFT option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>; if used twice, pass the
-               PCRE_PARTIAL_HARD option 
+               PCRE_PARTIAL_HARD option
   \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
   \R         pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b>
   \S         output details of memory get/free calls during matching
+  \Y         pass the PCRE_NO_START_OPTIMIZE option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
   \Z         pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
   \?         pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
   \&#62;dd       start the match at offset dd (any number of digits);
@@ -540,7 +541,7 @@
 </pre>
 (Using the normal matching function on this data finds only "tang".) The
 longest matching string is always given first (and numbered zero). After a
-PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the 
+PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the
 partially matching substring.
 </P>
 <P>
@@ -708,7 +709,7 @@
 </P>
 <br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 September 2009
+Last updated: 26 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/pcre.txt
===================================================================
--- code/trunk/doc/pcre.txt    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcre.txt    2009-10-05 10:59:35 UTC (rev 461)
@@ -2,7 +2,7 @@
 This file contains a concatenation of the PCRE man pages, converted to plain
 text format for ease of searching with a text editor, or for use on systems
 that do not have a man page processor. The small individual files that give
-synopses of each function in the library have not been included. Neither has 
+synopses of each function in the library have not been included. Neither has
 the pcredemo program. There are separate text files for the pcregrep and
 pcretest commands.
 -----------------------------------------------------------------------------
@@ -19,23 +19,23 @@


        The  PCRE  library is a set of functions that implement regular expres-
        sion pattern matching using the same syntax and semantics as Perl, with
-       just  a  few  differences. Certain features that appeared in Python and
-       PCRE before they appeared in Perl are also available using  the  Python
-       syntax.  There is also some support for certain .NET and Oniguruma syn-
-       tax items, and there is an option for  requesting  some  minor  changes
-       that give better JavaScript compatibility.
+       just  a few differences. Some features that appeared in Python and PCRE
+       before they appeared in Perl are also available using the  Python  syn-
+       tax,  there  is  some  support for one or two .NET and Oniguruma syntax
+       items, and there is an option for requesting some  minor  changes  that
+       give better JavaScript compatibility.


-       The  current implementation of PCRE (release 8.xx) corresponds approxi-
-       mately with Perl 5.10, including support for UTF-8 encoded strings  and
-       Unicode general category properties. However, UTF-8 and Unicode support
-       has to be explicitly enabled; it is not the default. The Unicode tables
-       correspond to Unicode release 5.1.
+       The  current implementation of PCRE corresponds approximately with Perl
+       5.10, including support for UTF-8 encoded strings and  Unicode  general
+       category  properties.  However,  UTF-8  and  Unicode  support has to be
+       explicitly enabled; it is not the default. The  Unicode  tables  corre-
+       spond to Unicode release 5.1.


        In  addition to the Perl-compatible matching function, PCRE contains an
-       alternative matching function that matches the same  compiled  patterns
-       in  a different way. In certain circumstances, the alternative function
-       has some advantages. For a discussion of the two  matching  algorithms,
-       see the pcrematching page.
+       alternative function that matches the same compiled patterns in a  dif-
+       ferent way. In certain circumstances, the alternative function has some
+       advantages.  For a discussion of the two matching algorithms,  see  the
+       pcrematching page.


        PCRE  is  written  in C and released as a C library. A number of people
        have written wrappers and interfaces of various kinds.  In  particular,
@@ -55,8 +55,8 @@
        library is built. The pcre_config() function makes it  possible  for  a
        client  to  discover  which  features are available. The features them-
        selves are described in the pcrebuild page. Documentation about  build-
-       ing  PCRE for various operating systems can be found in the README file
-       in the source distribution.
+       ing  PCRE  for various operating systems can be found in the README and
+       NON-UNIX-USE files in the source distribution.


        The library contains a number of undocumented  internal  functions  and
        data  tables  that  are  used by more than one of the exported external
@@ -89,12 +89,12 @@
          pcrepartial       details of the partial matching facility
          pcrepattern       syntax and semantics of supported
                              regular expressions
-         pcresyntax        quick syntax reference
          pcreperform       discussion of performance issues
          pcreposix         the POSIX-compatible C API
          pcreprecompile    details of saving and re-using precompiled patterns
          pcresample        discussion of the pcredemo program
          pcrestack         discussion of stack usage
+         pcresyntax        quick syntax reference
          pcretest          description of the pcretest testing command


        In  addition,  in the "man" and HTML formats, there is a short page for
@@ -142,7 +142,7 @@
        with  the  PCRE_UTF8  option  flag,  or the pattern must start with the
        sequence (*UTF8). When either of these is the case,  both  the  pattern
        and  any  subject  strings  that  are matched against it are treated as
-       UTF-8 strings instead of just strings of bytes.
+       UTF-8 strings instead of strings of 1-byte characters.


        If you compile PCRE with UTF-8 support, but do not use it at run  time,
        the  library will be a bit bigger, but the additional run time overhead
@@ -263,11 +263,11 @@


REVISION

-       Last updated: 01 September 2009
+       Last updated: 28 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREBUILD(3)                                                      PCREBUILD(3)



@@ -324,7 +324,7 @@
        to  the  configure  command.  Of  itself, this does not make PCRE treat
        strings as UTF-8. As well as compiling PCRE with this option, you  also
        have  have to set the PCRE_UTF8 option when you call the pcre_compile()
-       function.
+       or pcre_compile2() functions.


        If you set --enable-utf8 when compiling in an EBCDIC environment,  PCRE
        expects its input to be either ASCII or UTF-8 (depending on the runtime
@@ -432,9 +432,9 @@
        nation metacharacter). By default, two-byte values are used  for  these
        offsets,  leading  to  a  maximum size for a compiled pattern of around
        64K. This is sufficient to handle all but the most  gigantic  patterns.
-       Nevertheless,  some  people do want to process enormous patterns, so it
-       is possible to compile PCRE to use three-byte or four-byte  offsets  by
-       adding a setting such as
+       Nevertheless,  some  people do want to process truyl enormous patterns,
+       so it is possible to compile PCRE to use three-byte or  four-byte  off-
+       sets by adding a setting such as


          --with-link-size=3


@@ -461,7 +461,7 @@
        to the configure command. With this configuration, PCRE  will  use  the
        pcre_stack_malloc  and pcre_stack_free variables to call memory manage-
        ment functions. By default these point to malloc() and free(), but  you
-       can replace the pointers so that your own functions are used.
+       can replace the pointers so that your own functions are used instead.


        Separate  functions  are  provided  rather  than  using pcre_malloc and
        pcre_free because the  usage  is  very  predictable:  the  block  sizes
@@ -469,70 +469,69 @@
        reverse order. A calling program might be able to  implement  optimized
        functions  that  perform  better  than  malloc()  and free(). PCRE runs
        noticeably more slowly when built in this way. This option affects only
-       the   pcre_exec()   function;   it   is   not   relevant  for  the  the
-       pcre_dfa_exec() function.
+       the pcre_exec() function; it is not relevant for pcre_dfa_exec().



LIMITING PCRE RESOURCE USAGE

-       Internally, PCRE has a function called match(), which it calls  repeat-
-       edly   (sometimes   recursively)  when  matching  a  pattern  with  the
-       pcre_exec() function. By controlling the maximum number of  times  this
-       function  may be called during a single matching operation, a limit can
-       be placed on the resources used by a single call  to  pcre_exec().  The
-       limit  can be changed at run time, as described in the pcreapi documen-
-       tation. The default is 10 million, but this can be changed by adding  a
+       Internally,  PCRE has a function called match(), which it calls repeat-
+       edly  (sometimes  recursively)  when  matching  a  pattern   with   the
+       pcre_exec()  function.  By controlling the maximum number of times this
+       function may be called during a single matching operation, a limit  can
+       be  placed  on  the resources used by a single call to pcre_exec(). The
+       limit can be changed at run time, as described in the pcreapi  documen-
+       tation.  The default is 10 million, but this can be changed by adding a
        setting such as


          --with-match-limit=500000


-       to   the   configure  command.  This  setting  has  no  effect  on  the
+       to  the  configure  command.  This  setting  has  no  effect   on   the
        pcre_dfa_exec() matching function.


-       In some environments it is desirable to limit the  depth  of  recursive
+       In  some  environments  it is desirable to limit the depth of recursive
        calls of match() more strictly than the total number of calls, in order
-       to restrict the maximum amount of stack (or heap,  if  --disable-stack-
+       to  restrict  the maximum amount of stack (or heap, if --disable-stack-
        for-recursion is specified) that is used. A second limit controls this;
-       it defaults to the value that  is  set  for  --with-match-limit,  which
-       imposes  no  additional constraints. However, you can set a lower limit
+       it  defaults  to  the  value  that is set for --with-match-limit, which
+       imposes no additional constraints. However, you can set a  lower  limit
        by adding, for example,


          --with-match-limit-recursion=10000


-       to the configure command. This value can  also  be  overridden  at  run
+       to  the  configure  command.  This  value can also be overridden at run
        time.



CREATING CHARACTER TABLES AT BUILD TIME

-       PCRE  uses fixed tables for processing characters whose code values are
-       less than 256. By default, PCRE is built with a set of tables that  are
-       distributed  in  the  file pcre_chartables.c.dist. These tables are for
+       PCRE uses fixed tables for processing characters whose code values  are
+       less  than 256. By default, PCRE is built with a set of tables that are
+       distributed in the file pcre_chartables.c.dist. These  tables  are  for
        ASCII codes only. If you add


          --enable-rebuild-chartables


-       to the configure command, the distributed tables are  no  longer  used.
-       Instead,  a  program  called dftables is compiled and run. This outputs
+       to  the  configure  command, the distributed tables are no longer used.
+       Instead, a program called dftables is compiled and  run.  This  outputs
        the source for new set of tables, created in the default locale of your
        C runtime system. (This method of replacing the tables does not work if
-       you are cross compiling, because dftables is run on the local host.  If
-       you  need  to  create alternative tables when cross compiling, you will
+       you  are cross compiling, because dftables is run on the local host. If
+       you need to create alternative tables when cross  compiling,  you  will
        have to do so "by hand".)



USING EBCDIC CODE

-       PCRE assumes by default that it will run in an  environment  where  the
-       character  code  is  ASCII  (or Unicode, which is a superset of ASCII).
-       This is the case for most computer operating systems.  PCRE  can,  how-
+       PCRE  assumes  by  default that it will run in an environment where the
+       character code is ASCII (or Unicode, which is  a  superset  of  ASCII).
+       This  is  the  case for most computer operating systems. PCRE can, how-
        ever, be compiled to run in an EBCDIC environment by adding


          --enable-ebcdic


        to the configure command. This setting implies --enable-rebuild-charta-
-       bles. You should only use it if you know that  you  are  in  an  EBCDIC
-       environment  (for  example,  an  IBM  mainframe  operating system). The
+       bles.  You  should  only  use  it if you know that you are in an EBCDIC
+       environment (for example,  an  IBM  mainframe  operating  system).  The
        --enable-ebcdic option is incompatible with --enable-utf8.



@@ -546,7 +545,7 @@
          --enable-pcregrep-libbz2


        to the configure command. These options naturally require that the rel-
-       evant  libraries  are installed on your system. Configuration will fail
+       evant libraries are installed on your system. Configuration  will  fail
        if they are not.



@@ -556,24 +555,24 @@

          --enable-pcretest-libreadline


-       to the configure command,  pcretest  is  linked  with  the  libreadline
-       library,  and  when its input is from a terminal, it reads it using the
+       to  the  configure  command,  pcretest  is  linked with the libreadline
+       library, and when its input is from a terminal, it reads it  using  the
        readline() function. This provides line-editing and history facilities.
-       Note that libreadline is GPL-licenced, so if you distribute a binary of
+       Note that libreadline is GPL-licensed, so if you distribute a binary of
        pcretest linked in this way, there may be licensing issues.


-       Setting this option causes the -lreadline option to  be  added  to  the
-       pcretest  build.  In many operating environments with a sytem-installed
+       Setting  this  option  causes  the -lreadline option to be added to the
+       pcretest build. In many operating environments with  a  sytem-installed
        libreadline this is sufficient. However, in some environments (e.g.  if
-       an  unmodified  distribution version of readline is in use), some extra
-       configuration may be necessary. The INSTALL file for  libreadline  says
+       an unmodified distribution version of readline is in use),  some  extra
+       configuration  may  be necessary. The INSTALL file for libreadline says
        this:


          "Readline uses the termcap functions, but does not link with the
          termcap or curses library itself, allowing applications which link
          with readline the to choose an appropriate library."


-       If  your environment has not been set up so that an appropriate library
+       If your environment has not been set up so that an appropriate  library
        is automatically included, you may need to add something like


          LIBS="-ncurses"
@@ -595,11 +594,11 @@


REVISION

-       Last updated: 06 September 2009
+       Last updated: 29 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREMATCHING(3)                                                PCREMATCHING(3)



@@ -683,13 +682,19 @@
        though  it is not implemented as a traditional finite state machine (it
        keeps multiple states active simultaneously).


+       Although the general principle of this matching algorithm  is  that  it
+       scans  the subject string only once, without backtracking, there is one
+       exception: when a lookaround assertion is encountered,  the  characters
+       following  or  preceding  the  current  point  have to be independently
+       inspected.
+
        The scan continues until either the end of the subject is  reached,  or
        there  are  no more unterminated paths. At this point, terminated paths
        represent the different matching possibilities (if there are none,  the
        match  has  failed).   Thus,  if there is more than one possible match,
        this algorithm finds all of them, and in particular, it finds the long-
-       est.  In PCRE, there is an option to stop the algorithm after the first
-       match (which is necessarily the shortest) has been found.
+       est.  There  is  an  option to stop the algorithm after the first match
+       (which is necessarily the shortest) is found.


        Note that all the matches that are found start at the same point in the
        subject. If the pattern
@@ -701,73 +706,69 @@
        at the fourth character of the subject. The algorithm does not automat-
        ically move on to find matches that start at later positions.


-       Although the general principle of this matching algorithm  is  that  it
-       scans  the subject string only once, without backtracking, there is one
-       exception: when a lookbehind assertion is  encountered,  the  preceding
-       characters have to be re-inspected.
-
        There are a number of features of PCRE regular expressions that are not
        supported by the alternative matching algorithm. They are as follows:


-       1. Because the algorithm finds all  possible  matches,  the  greedy  or
-       ungreedy  nature  of repetition quantifiers is not relevant. Greedy and
+       1.  Because  the  algorithm  finds  all possible matches, the greedy or
+       ungreedy nature of repetition quantifiers is not relevant.  Greedy  and
        ungreedy quantifiers are treated in exactly the same way. However, pos-
-       sessive  quantifiers can make a difference when what follows could also
+       sessive quantifiers can make a difference when what follows could  also
        match what is quantified, for example in a pattern like this:


          ^a++\w!


-       This pattern matches "aaab!" but not "aaa!", which would be matched  by
-       a  non-possessive quantifier. Similarly, if an atomic group is present,
-       it is matched as if it were a standalone pattern at the current  point,
-       and  the  longest match is then "locked in" for the rest of the overall
+       This  pattern matches "aaab!" but not "aaa!", which would be matched by
+       a non-possessive quantifier. Similarly, if an atomic group is  present,
+       it  is matched as if it were a standalone pattern at the current point,
+       and the longest match is then "locked in" for the rest of  the  overall
        pattern.


        2. When dealing with multiple paths through the tree simultaneously, it
-       is  not  straightforward  to  keep track of captured substrings for the
-       different matching possibilities, and  PCRE's  implementation  of  this
+       is not straightforward to keep track of  captured  substrings  for  the
+       different  matching  possibilities,  and  PCRE's implementation of this
        algorithm does not attempt to do this. This means that no captured sub-
        strings are available.


-       3. Because no substrings are captured, back references within the  pat-
+       3.  Because no substrings are captured, back references within the pat-
        tern are not supported, and cause errors if encountered.


-       4.  For  the same reason, conditional expressions that use a backrefer-
-       ence as the condition or test for a specific group  recursion  are  not
+       4. For the same reason, conditional expressions that use  a  backrefer-
+       ence  as  the  condition or test for a specific group recursion are not
        supported.


-       5.  Because  many  paths  through the tree may be active, the \K escape
+       5. Because many paths through the tree may be  active,  the  \K  escape
        sequence, which resets the start of the match when encountered (but may
-       be  on  some  paths  and not on others), is not supported. It causes an
+       be on some paths and not on others), is not  supported.  It  causes  an
        error if encountered.


-       6. Callouts are supported, but the value of the  capture_top  field  is
+       6.  Callouts  are  supported, but the value of the capture_top field is
        always 1, and the value of the capture_last field is always -1.


-       7.  The \C escape sequence, which (in the standard algorithm) matches a
-       single byte, even in UTF-8 mode, is not supported because the  alterna-
-       tive  algorithm  moves  through  the  subject string one character at a
+       7. The \C escape sequence, which (in the standard algorithm) matches  a
+       single  byte, even in UTF-8 mode, is not supported because the alterna-
+       tive algorithm moves through the subject  string  one  character  at  a
        time, for all active paths through the tree.


-       8. Except for (*FAIL), the backtracking control verbs such as  (*PRUNE)
-       are  not  supported.  (*FAIL)  is supported, and behaves like a failing
+       8.  Except for (*FAIL), the backtracking control verbs such as (*PRUNE)
+       are not supported. (*FAIL) is supported, and  behaves  like  a  failing
        negative assertion.



ADVANTAGES OF THE ALTERNATIVE ALGORITHM

-       Using the alternative matching algorithm provides the following  advan-
+       Using  the alternative matching algorithm provides the following advan-
        tages:


        1. All possible matches (at a single point in the subject) are automat-
-       ically found, and in particular, the longest match is  found.  To  find
+       ically  found,  and  in particular, the longest match is found. To find
        more than one match using the standard algorithm, you have to do kludgy
        things with callouts.


-       2. Because the alternative algorithm  scans  the  subject  string  just
-       once,  and  never  needs to backtrack, it is possible to pass very long
-       subject strings to the matching function in  several  pieces,  checking
-       for partial matching each time.
+       2.  Because  the  alternative  algorithm  scans the subject string just
+       once, and never needs to backtrack, it is possible to  pass  very  long
+       subject  strings  to  the matching function in several pieces, checking
+       for partial matching each time.  The  pcrepartial  documentation  gives
+       details of partial matching.



DISADVANTAGES OF THE ALTERNATIVE ALGORITHM
@@ -793,11 +794,11 @@

REVISION

-       Last updated: 05 September 2009
+       Last updated: 29 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREAPI(3)                                                          PCREAPI(3)



@@ -1126,7 +1127,9 @@
        Either of the functions pcre_compile() or pcre_compile2() can be called
        to compile a pattern into an internal form. The only difference between
        the  two interfaces is that pcre_compile2() has an additional argument,
-       errorcodeptr, via which a numerical error code can be returned.
+       errorcodeptr, via which a numerical error  code  can  be  returned.  To
+       avoid  too  much repetition, we refer just to pcre_compile() below, but
+       the information applies equally to pcre_compile2().


        The pattern is a C string terminated by a binary zero, and is passed in
        the  pattern  argument.  A  pointer to a single block of memory that is
@@ -1144,20 +1147,20 @@
        The options argument contains various bit settings that affect the com-
        pilation. It should be zero if no options are required.  The  available
        options  are  described  below. Some of them (in particular, those that
-       are compatible with Perl, but also some others) can  also  be  set  and
+       are compatible with Perl, but some others as well) can also be set  and
        unset  from  within  the  pattern  (see the detailed description in the
        pcrepattern documentation). For those options that can be different  in
        different  parts  of  the pattern, the contents of the options argument
-       specifies their initial settings at the start of compilation and execu-
-       tion.  The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the
-       time of matching as well as at compile time.
+       specifies their settings at the start of compilation and execution. The
+       PCRE_ANCHORED, PCRE_BSR_xxx, and PCRE_NEWLINE_xxx options can be set at
+       the time of matching as well as at compile time.


        If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
        if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
        sets the variable pointed to by errptr to point to a textual error mes-
        sage. This is a static string that is part of the library. You must not
        try to free it. The byte offset from the start of the  pattern  to  the
-       character  that  was  being  processes when the error was discovered is
+       character  that  was  being  processed when the error was discovered is
        placed in the variable pointed to by erroffset, which must not be NULL.
        If  it  is,  an  immediate error is given. Some errors are not detected
        until checks are carried out when the whole pattern has  been  scanned;
@@ -1491,14 +1494,14 @@
        the results of the study.


        The  returned  value  from  pcre_study()  can  be  passed  directly  to
-       pcre_exec(). However, a pcre_extra block  also  contains  other  fields
-       that  can  be  set  by the caller before the block is passed; these are
-       described below in the section on matching a pattern.
+       pcre_exec() or pcre_dfa_exec(). However, a pcre_extra block  also  con-
+       tains  other  fields  that can be set by the caller before the block is
+       passed; these are described below in the section on matching a pattern.


-       If studying the pattern does not  produce  any  additional  information
+       If studying the  pattern  does  not  produce  any  useful  information,
        pcre_study() returns NULL. In that circumstance, if the calling program
-       wants to pass any of the other fields to pcre_exec(), it  must  set  up
-       its own pcre_extra block.
+       wants  to  pass  any  of   the   other   fields   to   pcre_exec()   or
+       pcre_dfa_exec(), it must set up its own pcre_extra block.


        The  second  argument of pcre_study() contains option bits. At present,
        no options are defined, and this argument should always be zero.
@@ -1518,63 +1521,72 @@
            0,              /* no options exist */
            &error);        /* set to NULL or points to a message */


-       At present, studying a pattern is useful only for non-anchored patterns
-       that do not have a single fixed starting character. A bitmap of  possi-
-       ble starting bytes is created.
+       Studying a pattern does two things: first, a lower bound for the length
+       of subject string that is needed to match the pattern is computed. This
+       does not mean that there are any strings of that length that match, but
+       it does guarantee that no shorter strings match. The value is  used  by
+       pcre_exec()  and  pcre_dfa_exec()  to  avoid  wasting time by trying to
+       match strings that are shorter than the lower bound. You can  find  out
+       the value in a calling program via the pcre_fullinfo() function.


+       Studying a pattern is also useful for non-anchored patterns that do not
+       have a single fixed starting character. A bitmap of  possible  starting
+       bytes  is  created. This speeds up finding a position in the subject at
+       which to start matching.


+
LOCALE SUPPORT

-       PCRE  handles  caseless matching, and determines whether characters are
-       letters, digits, or whatever, by reference to a set of tables,  indexed
-       by  character  value.  When running in UTF-8 mode, this applies only to
-       characters with codes less than 128. Higher-valued  codes  never  match
-       escapes  such  as  \w or \d, but can be tested with \p if PCRE is built
-       with Unicode character property support. The use of locales  with  Uni-
-       code  is discouraged. If you are handling characters with codes greater
-       than 128, you should either use UTF-8 and Unicode, or use locales,  but
+       PCRE handles caseless matching, and determines whether  characters  are
+       letters,  digits, or whatever, by reference to a set of tables, indexed
+       by character value. When running in UTF-8 mode, this  applies  only  to
+       characters  with  codes  less than 128. Higher-valued codes never match
+       escapes such as \w or \d, but can be tested with \p if  PCRE  is  built
+       with  Unicode  character property support. The use of locales with Uni-
+       code is discouraged. If you are handling characters with codes  greater
+       than  128, you should either use UTF-8 and Unicode, or use locales, but
        not try to mix the two.


-       PCRE  contains  an  internal set of tables that are used when the final
-       argument of pcre_compile() is  NULL.  These  are  sufficient  for  many
+       PCRE contains an internal set of tables that are used  when  the  final
+       argument  of  pcre_compile()  is  NULL.  These  are sufficient for many
        applications.  Normally, the internal tables recognize only ASCII char-
        acters. However, when PCRE is built, it is possible to cause the inter-
        nal tables to be rebuilt in the default "C" locale of the local system,
        which may cause them to be different.


-       The internal tables can always be overridden by tables supplied by  the
+       The  internal tables can always be overridden by tables supplied by the
        application that calls PCRE. These may be created in a different locale
-       from the default. As more and more applications change  to  using  Uni-
+       from  the  default.  As more and more applications change to using Uni-
        code, the need for this locale support is expected to die away.


-       External  tables  are  built by calling the pcre_maketables() function,
-       which has no arguments, in the relevant locale. The result can then  be
-       passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For
-       example, to build and use tables that are appropriate  for  the  French
-       locale  (where  accented  characters  with  values greater than 128 are
+       External tables are built by calling  the  pcre_maketables()  function,
+       which  has no arguments, in the relevant locale. The result can then be
+       passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
+       example,  to  build  and use tables that are appropriate for the French
+       locale (where accented characters with  values  greater  than  128  are
        treated as letters), the following code could be used:


          setlocale(LC_CTYPE, "fr_FR");
          tables = pcre_maketables();
          re = pcre_compile(..., tables);


-       The locale name "fr_FR" is used on Linux and other  Unix-like  systems;
+       The  locale  name "fr_FR" is used on Linux and other Unix-like systems;
        if you are using Windows, the name for the French locale is "french".


-       When  pcre_maketables()  runs,  the  tables are built in memory that is
-       obtained via pcre_malloc. It is the caller's responsibility  to  ensure
-       that  the memory containing the tables remains available for as long as
+       When pcre_maketables() runs, the tables are built  in  memory  that  is
+       obtained  via  pcre_malloc. It is the caller's responsibility to ensure
+       that the memory containing the tables remains available for as long  as
        it is needed.


        The pointer that is passed to pcre_compile() is saved with the compiled
-       pattern,  and the same tables are used via this pointer by pcre_study()
+       pattern, and the same tables are used via this pointer by  pcre_study()
        and normally also by pcre_exec(). Thus, by default, for any single pat-
        tern, compilation, studying and matching all happen in the same locale,
        but different patterns can be compiled in different locales.


-       It is possible to pass a table pointer or NULL (indicating the  use  of
-       the  internal  tables)  to  pcre_exec(). Although not intended for this
-       purpose, this facility could be used to match a pattern in a  different
+       It  is  possible to pass a table pointer or NULL (indicating the use of
+       the internal tables) to pcre_exec(). Although  not  intended  for  this
+       purpose,  this facility could be used to match a pattern in a different
        locale from the one in which it was compiled. Passing table pointers at
        run time is discussed below in the section on matching a pattern.


@@ -1584,15 +1596,15 @@
        int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
             int what, void *where);


-       The pcre_fullinfo() function returns information about a compiled  pat-
+       The  pcre_fullinfo() function returns information about a compiled pat-
        tern. It replaces the obsolete pcre_info() function, which is neverthe-
        less retained for backwards compability (and is documented below).


-       The first argument for pcre_fullinfo() is a  pointer  to  the  compiled
-       pattern.  The second argument is the result of pcre_study(), or NULL if
-       the pattern was not studied. The third argument specifies  which  piece
-       of  information  is required, and the fourth argument is a pointer to a
-       variable to receive the data. The yield of the  function  is  zero  for
+       The  first  argument  for  pcre_fullinfo() is a pointer to the compiled
+       pattern. The second argument is the result of pcre_study(), or NULL  if
+       the  pattern  was not studied. The third argument specifies which piece
+       of information is required, and the fourth argument is a pointer  to  a
+       variable  to  receive  the  data. The yield of the function is zero for
        success, or one of the following negative numbers:


          PCRE_ERROR_NULL       the argument code was NULL
@@ -1600,9 +1612,9 @@
          PCRE_ERROR_BADMAGIC   the "magic number" was not found
          PCRE_ERROR_BADOPTION  the value of what was invalid


-       The  "magic  number" is placed at the start of each compiled pattern as
-       an simple check against passing an arbitrary memory pointer. Here is  a
-       typical  call  of pcre_fullinfo(), to obtain the length of the compiled
+       The "magic number" is placed at the start of each compiled  pattern  as
+       an  simple check against passing an arbitrary memory pointer. Here is a
+       typical call of pcre_fullinfo(), to obtain the length of  the  compiled
        pattern:


          int rc;
@@ -1613,111 +1625,131 @@
            PCRE_INFO_SIZE,   /* what is required */
            &length);         /* where to put the data */


-       The possible values for the third argument are defined in  pcre.h,  and
+       The  possible  values for the third argument are defined in pcre.h, and
        are as follows:


          PCRE_INFO_BACKREFMAX


-       Return  the  number  of  the highest back reference in the pattern. The
-       fourth argument should point to an int variable. Zero  is  returned  if
+       Return the number of the highest back reference  in  the  pattern.  The
+       fourth  argument  should  point to an int variable. Zero is returned if
        there are no back references.


          PCRE_INFO_CAPTURECOUNT


-       Return  the  number of capturing subpatterns in the pattern. The fourth
+       Return the number of capturing subpatterns in the pattern.  The  fourth
        argument should point to an int variable.


          PCRE_INFO_DEFAULT_TABLES


-       Return a pointer to the internal default character tables within  PCRE.
-       The  fourth  argument should point to an unsigned char * variable. This
+       Return  a pointer to the internal default character tables within PCRE.
+       The fourth argument should point to an unsigned char *  variable.  This
        information call is provided for internal use by the pcre_study() func-
-       tion.  External  callers  can  cause PCRE to use its internal tables by
+       tion. External callers can cause PCRE to use  its  internal  tables  by
        passing a NULL table pointer.


          PCRE_INFO_FIRSTBYTE


-       Return information about the first byte of any matched  string,  for  a
-       non-anchored  pattern. The fourth argument should point to an int vari-
-       able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old  name
+       Return  information  about  the first byte of any matched string, for a
+       non-anchored pattern. The fourth argument should point to an int  vari-
+       able.  (This option used to be called PCRE_INFO_FIRSTCHAR; the old name
        is still recognized for backwards compatibility.)


-       If  there  is  a  fixed first byte, for example, from a pattern such as
+       If there is a fixed first byte, for example, from  a  pattern  such  as
        (cat|cow|coyote), its value is returned. Otherwise, if either


-       (a) the pattern was compiled with the PCRE_MULTILINE option, and  every
+       (a)  the pattern was compiled with the PCRE_MULTILINE option, and every
        branch starts with "^", or


        (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
        set (if it were set, the pattern would be anchored),


-       -1 is returned, indicating that the pattern matches only at  the  start
-       of  a  subject string or after any newline within the string. Otherwise
+       -1  is  returned, indicating that the pattern matches only at the start
+       of a subject string or after any newline within the  string.  Otherwise
        -2 is returned. For anchored patterns, -2 is returned.


          PCRE_INFO_FIRSTTABLE


-       If the pattern was studied, and this resulted in the construction of  a
+       If  the pattern was studied, and this resulted in the construction of a
        256-bit table indicating a fixed set of bytes for the first byte in any
-       matching string, a pointer to the table is returned. Otherwise NULL  is
-       returned.  The fourth argument should point to an unsigned char * vari-
+       matching  string, a pointer to the table is returned. Otherwise NULL is
+       returned. The fourth argument should point to an unsigned char *  vari-
        able.


          PCRE_INFO_HASCRORLF


-       Return 1 if the pattern contains any explicit  matches  for  CR  or  LF
-       characters,  otherwise  0.  The  fourth argument should point to an int
-       variable. An explicit match is either a literal CR or LF character,  or
+       Return  1  if  the  pattern  contains any explicit matches for CR or LF
+       characters, otherwise 0. The fourth argument should  point  to  an  int
+       variable.  An explicit match is either a literal CR or LF character, or
        \r or \n.


          PCRE_INFO_JCHANGED


-       Return  1  if  the (?J) or (?-J) option setting is used in the pattern,
-       otherwise 0. The fourth argument should point to an int variable.  (?J)
+       Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,
+       otherwise  0. The fourth argument should point to an int variable. (?J)
        and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.


          PCRE_INFO_LASTLITERAL


-       Return  the  value of the rightmost literal byte that must exist in any
-       matched string, other than at its  start,  if  such  a  byte  has  been
+       Return the value of the rightmost literal byte that must exist  in  any
+       matched  string,  other  than  at  its  start,  if such a byte has been
        recorded. The fourth argument should point to an int variable. If there
-       is no such byte, -1 is returned. For anchored patterns, a last  literal
-       byte  is  recorded only if it follows something of variable length. For
+       is  no such byte, -1 is returned. For anchored patterns, a last literal
+       byte is recorded only if it follows something of variable  length.  For
        example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
        /^a\dz\d/ the returned value is -1.


+         PCRE_INFO_MINLENGTH
+
+       If the pattern was studied and a minimum length  for  matching  subject
+       strings  was  computed,  its  value is returned. Otherwise the returned
+       value is -1. The value is a number of characters, not bytes  (this  may
+       be  relevant in UTF-8 mode). The fourth argument should point to an int
+       variable. A non-negative value is a lower bound to the  length  of  any
+       matching  string.  There  may not be any strings of that length that do
+       actually match, but every string that does match is at least that long.
+
          PCRE_INFO_NAMECOUNT
          PCRE_INFO_NAMEENTRYSIZE
          PCRE_INFO_NAMETABLE


-       PCRE  supports the use of named as well as numbered capturing parenthe-
-       ses. The names are just an additional way of identifying the  parenthe-
+       PCRE supports the use of named as well as numbered capturing  parenthe-
+       ses.  The names are just an additional way of identifying the parenthe-
        ses, which still acquire numbers. Several convenience functions such as
-       pcre_get_named_substring() are provided for  extracting  captured  sub-
-       strings  by  name. It is also possible to extract the data directly, by
-       first converting the name to a number in order to  access  the  correct
+       pcre_get_named_substring()  are  provided  for extracting captured sub-
+       strings by name. It is also possible to extract the data  directly,  by
+       first  converting  the  name to a number in order to access the correct
        pointers in the output vector (described with pcre_exec() below). To do
-       the conversion, you need  to  use  the  name-to-number  map,  which  is
+       the  conversion,  you  need  to  use  the  name-to-number map, which is
        described by these three values.


        The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
        gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
-       of  each  entry;  both  of  these  return  an int value. The entry size
-       depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
-       a  pointer  to  the  first  entry of the table (a pointer to char). The
+       of each entry; both of these  return  an  int  value.  The  entry  size
+       depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns
+       a pointer to the first entry of the table  (a  pointer  to  char).  The
        first two bytes of each entry are the number of the capturing parenthe-
-       sis,  most  significant byte first. The rest of the entry is the corre-
-       sponding name, zero terminated. The names are  in  alphabetical  order.
-       When PCRE_DUPNAMES is set, duplicate names are in order of their paren-
-       theses numbers. For example, consider  the  following  pattern  (assume
-       PCRE_EXTENDED  is  set,  so  white  space  -  including  newlines  - is
-       ignored):
+       sis, most significant byte first. The rest of the entry is  the  corre-
+       sponding name, zero terminated.


+       The  names are in alphabetical order. Duplicate names may appear if (?|
+       is used to create multiple groups with the same number, as described in
+       the  section  on  duplicate subpattern numbers in the pcrepattern page.
+       Duplicate names for subpatterns with different  numbers  are  permitted
+       only  if  PCRE_DUPNAMES  is  set. In all cases of duplicate names, they
+       appear in the table in the order in which they were found in  the  pat-
+       tern.  In  the  absence  of (?| this is the order of increasing number;
+       when (?| is used this is not necessarily the case because later subpat-
+       terns may have lower numbers.
+
+       As  a  simple  example of the name/number table, consider the following
+       pattern (assume PCRE_EXTENDED is set, so white space -  including  new-
+       lines - is ignored):
+
          (?<date> (?<year>(\d\d)?\d\d) -
          (?<month>\d\d) - (?<day>\d\d) )


-       There are four named subpatterns, so the table has  four  entries,  and
-       each  entry  in the table is eight bytes long. The table is as follows,
+       There  are  four  named subpatterns, so the table has four entries, and
+       each entry in the table is eight bytes long. The table is  as  follows,
        with non-printing bytes shows in hexadecimal, and undefined bytes shown
        as ??:


@@ -1726,31 +1758,31 @@
          00 04 m  o  n  t  h  00
          00 02 y  e  a  r  00 ??


-       When  writing  code  to  extract  data from named subpatterns using the
-       name-to-number map, remember that the length of the entries  is  likely
+       When writing code to extract data  from  named  subpatterns  using  the
+       name-to-number  map,  remember that the length of the entries is likely
        to be different for each compiled pattern.


          PCRE_INFO_OKPARTIAL


-       Return  1  if  the  pattern  can  be  used  for  partial  matching with
-       pcre_exec(), otherwise 0. The fourth argument should point  to  an  int
-       variable.  From  release  8.00,  this  always  returns  1,  because the
-       restrictions that previously applied  to  partial  matching  have  been
-       lifted.  The  pcrepartial documentation gives details of partial match-
+       Return 1  if  the  pattern  can  be  used  for  partial  matching  with
+       pcre_exec(),  otherwise  0.  The fourth argument should point to an int
+       variable. From  release  8.00,  this  always  returns  1,  because  the
+       restrictions  that  previously  applied  to  partial matching have been
+       lifted. The pcrepartial documentation gives details of  partial  match-
        ing.


          PCRE_INFO_OPTIONS


-       Return a copy of the options with which the pattern was  compiled.  The
-       fourth  argument  should  point to an unsigned long int variable. These
+       Return  a  copy of the options with which the pattern was compiled. The
+       fourth argument should point to an unsigned long  int  variable.  These
        option bits are those specified in the call to pcre_compile(), modified
        by any top-level option settings at the start of the pattern itself. In
-       other words, they are the options that will be in force  when  matching
-       starts.  For  example, if the pattern /(?im)abc(?-i)d/ is compiled with
-       the PCRE_EXTENDED option, the result is PCRE_CASELESS,  PCRE_MULTILINE,
+       other  words,  they are the options that will be in force when matching
+       starts. For example, if the pattern /(?im)abc(?-i)d/ is  compiled  with
+       the  PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,
        and PCRE_EXTENDED.


-       A  pattern  is  automatically  anchored by PCRE if all of its top-level
+       A pattern is automatically anchored by PCRE if  all  of  its  top-level
        alternatives begin with one of the following:


          ^     unless PCRE_MULTILINE is set
@@ -1764,7 +1796,7 @@


          PCRE_INFO_SIZE


-       Return  the  size  of the compiled pattern, that is, the value that was
+       Return the size of the compiled pattern, that is, the  value  that  was
        passed as the argument to pcre_malloc() when PCRE was getting memory in
        which to place the compiled data. The fourth argument should point to a
        size_t variable.
@@ -1772,9 +1804,10 @@
          PCRE_INFO_STUDYSIZE


        Return the size of the data block pointed to by the study_data field in
-       a  pcre_extra  block.  That  is,  it  is  the  value that was passed to
+       a pcre_extra block. That is,  it  is  the  value  that  was  passed  to
        pcre_malloc() when PCRE was getting memory into which to place the data
-       created  by  pcre_study(). The fourth argument should point to a size_t
+       created by pcre_study(). If pcre_extra is NULL, or there  is  no  study
+       data,  zero  is  returned. The fourth argument should point to a size_t
        variable.



@@ -1830,7 +1863,7 @@

        The  function pcre_exec() is called to match a subject string against a
        compiled pattern, which is passed in the code argument. If the  pattern
-       has been studied, the result of the study should be passed in the extra
+       was  studied,  the  result  of  the study should be passed in the extra
        argument. This function is the main matching facility of  the  library,
        and it operates in a Perl-like manner. For specialist use there is also
        an alternative matching function, which is described below in the  sec-
@@ -1889,8 +1922,8 @@
        The match_limit field provides a means of preventing PCRE from using up
        a  vast amount of resources when running patterns that are not going to
        match, but which have a very large number  of  possibilities  in  their
-       search  trees.  The  classic  example  is  the  use of nested unlimited
-       repeats.
+       search  trees. The classic example is a pattern that uses nested unlim-
+       ited repeats.


        Internally, PCRE uses a function called match() which it calls  repeat-
        edly  (sometimes  recursively). The limit set by match_limit is imposed
@@ -2177,7 +2210,7 @@
        has to get additional memory for use during matching. Thus it  is  usu-
        ally advisable to supply an ovector.


-       The  pcre_info()  function  can  be used to find out how many capturing
+       The pcre_fullinfo() function can be used to find out how many capturing
        subpatterns there are in a compiled  pattern.  The  smallest  size  for
        ovector  that  will allow for n captured substrings, in addition to the
        offsets of the substring matched by the whole pattern, is (n+1)*3.
@@ -2438,10 +2471,13 @@
        ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the
        behaviour may not be what you want (see the next section).


-       Warning: If the pattern uses the "(?|" feature to set up multiple  sub-
-       patterns  with  the  same  number,  you cannot use names to distinguish
-       them, because names are not included in the compiled code. The matching
-       process uses only numbers.
+       Warning: If the pattern uses the (?| feature to set up multiple subpat-
+       terns  with  the  same number, as described in the section on duplicate
+       subpattern numbers in the pcrepattern page, you  cannot  use  names  to
+       distinguish  the  different subpatterns, because names are not included
+       in the compiled code. The matching process uses only numbers. For  this
+       reason,  the  use of different names for subpatterns of the same number
+       causes an error at compile time.



 DUPLICATE SUBPATTERN NAMES
@@ -2449,47 +2485,51 @@
        int pcre_get_stringtable_entries(const pcre *code,
             const char *name, char **first, char **last);


-       When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for
-       subpatterns are not required to  be  unique.  Normally,  patterns  with
-       duplicate  names  are such that in any one match, only one of the named
-       subpatterns participates. An example is shown in the pcrepattern  docu-
-       mentation.
+       When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for
+       subpatterns  are not required to be unique. (Duplicate names are always
+       allowed for subpatterns with the same number, created by using the  (?|
+       feature.  Indeed,  if  such subpatterns are named, they are required to
+       use the same names.)


-       When    duplicates   are   present,   pcre_copy_named_substring()   and
-       pcre_get_named_substring() return the first substring corresponding  to
-       the  given  name  that  is set. If none are set, PCRE_ERROR_NOSUBSTRING
-       (-7) is returned; no  data  is  returned.  The  pcre_get_stringnumber()
-       function  returns one of the numbers that are associated with the name,
+       Normally, patterns with duplicate names are such that in any one match,
+       only  one of the named subpatterns participates. An example is shown in
+       the pcrepattern documentation.
+
+       When   duplicates   are   present,   pcre_copy_named_substring()    and
+       pcre_get_named_substring()  return the first substring corresponding to
+       the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING
+       (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()
+       function returns one of the numbers that are associated with the  name,
        but it is not defined which it is.


-       If you want to get full details of all captured substrings for a  given
-       name,  you  must  use  the pcre_get_stringtable_entries() function. The
+       If  you want to get full details of all captured substrings for a given
+       name, you must use  the  pcre_get_stringtable_entries()  function.  The
        first argument is the compiled pattern, and the second is the name. The
-       third  and  fourth  are  pointers to variables which are updated by the
+       third and fourth are pointers to variables which  are  updated  by  the
        function. After it has run, they point to the first and last entries in
-       the  name-to-number  table  for  the  given  name.  The function itself
-       returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if
-       there  are none. The format of the table is described above in the sec-
-       tion entitled Information about a  pattern.   Given  all  the  relevant
-       entries  for the name, you can extract each of their numbers, and hence
+       the name-to-number table  for  the  given  name.  The  function  itself
+       returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
+       there are none. The format of the table is described above in the  sec-
+       tion  entitled  Information  about  a  pattern.  Given all the relevant
+       entries for the name, you can extract each of their numbers, and  hence
        the captured data, if any.



FINDING ALL POSSIBLE MATCHES

-       The traditional matching function uses a  similar  algorithm  to  Perl,
+       The  traditional  matching  function  uses a similar algorithm to Perl,
        which stops when it finds the first match, starting at a given point in
-       the subject. If you want to find all possible matches, or  the  longest
-       possible  match,  consider using the alternative matching function (see
-       below) instead. If you cannot use the alternative function,  but  still
-       need  to  find all possible matches, you can kludge it up by making use
+       the  subject.  If you want to find all possible matches, or the longest
+       possible match, consider using the alternative matching  function  (see
+       below)  instead.  If you cannot use the alternative function, but still
+       need to find all possible matches, you can kludge it up by  making  use
        of the callout facility, which is described in the pcrecallout documen-
        tation.


        What you have to do is to insert a callout right at the end of the pat-
-       tern.  When your callout function is called, extract and save the  cur-
-       rent  matched  substring.  Then  return  1, which forces pcre_exec() to
-       backtrack and try other alternatives. Ultimately, when it runs  out  of
+       tern.   When your callout function is called, extract and save the cur-
+       rent matched substring. Then return  1,  which  forces  pcre_exec()  to
+       backtrack  and  try other alternatives. Ultimately, when it runs out of
        matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.



@@ -2500,26 +2540,26 @@
             int options, int *ovector, int ovecsize,
             int *workspace, int wscount);


-       The  function  pcre_dfa_exec()  is  called  to  match  a subject string
-       against a compiled pattern, using a matching algorithm that  scans  the
-       subject  string  just  once, and does not backtrack. This has different
-       characteristics to the normal algorithm, and  is  not  compatible  with
-       Perl.  Some  of the features of PCRE patterns are not supported. Never-
-       theless, there are times when this kind of matching can be useful.  For
-       a  discussion  of  the  two matching algorithms, and a list of features
-       that pcre_dfa_exec() does not support, see the pcrematching  documenta-
+       The function pcre_dfa_exec()  is  called  to  match  a  subject  string
+       against  a  compiled pattern, using a matching algorithm that scans the
+       subject string just once, and does not backtrack.  This  has  different
+       characteristics  to  the  normal  algorithm, and is not compatible with
+       Perl. Some of the features of PCRE patterns are not  supported.  Never-
+       theless,  there are times when this kind of matching can be useful. For
+       a discussion of the two matching algorithms, and  a  list  of  features
+       that  pcre_dfa_exec() does not support, see the pcrematching documenta-
        tion.


-       The  arguments  for  the  pcre_dfa_exec()  function are the same as for
+       The arguments for the pcre_dfa_exec() function  are  the  same  as  for
        pcre_exec(), plus two extras. The ovector argument is used in a differ-
-       ent  way,  and  this is described below. The other common arguments are
-       used in the same way as for pcre_exec(), so their  description  is  not
+       ent way, and this is described below. The other  common  arguments  are
+       used  in  the  same way as for pcre_exec(), so their description is not
        repeated here.


-       The  two  additional  arguments provide workspace for the function. The
-       workspace vector should contain at least 20 elements. It  is  used  for
+       The two additional arguments provide workspace for  the  function.  The
+       workspace  vector  should  contain at least 20 elements. It is used for
        keeping  track  of  multiple  paths  through  the  pattern  tree.  More
-       workspace will be needed for patterns and subjects where  there  are  a
+       workspace  will  be  needed for patterns and subjects where there are a
        lot of potential matches.


        Here is an example of a simple call to pcre_dfa_exec():
@@ -2541,52 +2581,52 @@


    Option bits for pcre_dfa_exec()


-       The  unused  bits  of  the options argument for pcre_dfa_exec() must be
-       zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-
+       The unused bits of the options argument  for  pcre_dfa_exec()  must  be
+       zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
        LINE_xxx,        PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY,
        PCRE_NOTEMPTY_ATSTART, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, PCRE_PAR-
-       TIAL_SOFT,  PCRE_DFA_SHORTEST,  and  PCRE_DFA_RESTART. All but the last
-       four of these are  exactly  the  same  as  for  pcre_exec(),  so  their
+       TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All  but  the  last
+       four  of  these  are  exactly  the  same  as  for pcre_exec(), so their
        description is not repeated here.


          PCRE_PARTIAL_HARD
          PCRE_PARTIAL_SOFT


-       These  have the same general effect as they do for pcre_exec(), but the
-       details are slightly  different.  When  PCRE_PARTIAL_HARD  is  set  for
-       pcre_dfa_exec(),  it  returns PCRE_ERROR_PARTIAL if the end of the sub-
-       ject is reached and there is still at least  one  matching  possibility
+       These have the same general effect as they do for pcre_exec(), but  the
+       details  are  slightly  different.  When  PCRE_PARTIAL_HARD  is set for
+       pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of  the  sub-
+       ject  is  reached  and there is still at least one matching possibility
        that requires additional characters. This happens even if some complete
        matches have also been found. When PCRE_PARTIAL_SOFT is set, the return
        code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end
-       of the subject is reached, there have been  no  complete  matches,  but
-       there  is  still  at least one matching possibility. The portion of the
-       string that was inspected when the longest partial match was  found  is
+       of  the  subject  is  reached, there have been no complete matches, but
+       there is still at least one matching possibility. The  portion  of  the
+       string  that  was inspected when the longest partial match was found is
        set as the first matching string in both cases.


          PCRE_DFA_SHORTEST


-       Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to
+       Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to
        stop as soon as it has found one match. Because of the way the alterna-
-       tive  algorithm  works, this is necessarily the shortest possible match
+       tive algorithm works, this is necessarily the shortest  possible  match
        at the first possible matching point in the subject string.


          PCRE_DFA_RESTART


        When pcre_dfa_exec() returns a partial match, it is possible to call it
-       again,  with  additional  subject characters, and have it continue with
-       the same match. The PCRE_DFA_RESTART option requests this action;  when
-       it  is  set,  the workspace and wscount options must reference the same
-       vector as before because data about the match so far is  left  in  them
+       again, with additional subject characters, and have  it  continue  with
+       the  same match. The PCRE_DFA_RESTART option requests this action; when
+       it is set, the workspace and wscount options must  reference  the  same
+       vector  as  before  because data about the match so far is left in them
        after a partial match. There is more discussion of this facility in the
        pcrepartial documentation.


    Successful returns from pcre_dfa_exec()


-       When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-
+       When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
        string in the subject. Note, however, that all the matches from one run
-       of the function start at the same point in  the  subject.  The  shorter
-       matches  are all initial substrings of the longer matches. For example,
+       of  the  function  start  at the same point in the subject. The shorter
+       matches are all initial substrings of the longer matches. For  example,
        if the pattern


          <.*>
@@ -2601,61 +2641,61 @@
          <something> <something else>
          <something> <something else> <something further>


-       On success, the yield of the function is a number  greater  than  zero,
-       which  is  the  number of matched substrings. The substrings themselves
-       are returned in ovector. Each string uses two elements;  the  first  is
-       the  offset  to  the start, and the second is the offset to the end. In
-       fact, all the strings have the same start  offset.  (Space  could  have
-       been  saved by giving this only once, but it was decided to retain some
-       compatibility with the way pcre_exec() returns data,  even  though  the
+       On  success,  the  yield of the function is a number greater than zero,
+       which is the number of matched substrings.  The  substrings  themselves
+       are  returned  in  ovector. Each string uses two elements; the first is
+       the offset to the start, and the second is the offset to  the  end.  In
+       fact,  all  the  strings  have the same start offset. (Space could have
+       been saved by giving this only once, but it was decided to retain  some
+       compatibility  with  the  way pcre_exec() returns data, even though the
        meaning of the strings is different.)


        The strings are returned in reverse order of length; that is, the long-
-       est matching string is given first. If there were too many  matches  to
-       fit  into ovector, the yield of the function is zero, and the vector is
+       est  matching  string is given first. If there were too many matches to
+       fit into ovector, the yield of the function is zero, and the vector  is
        filled with the longest matches.


    Error returns from pcre_dfa_exec()


-       The pcre_dfa_exec() function returns a negative number when  it  fails.
-       Many  of  the  errors  are  the  same as for pcre_exec(), and these are
-       described above.  There are in addition the following errors  that  are
+       The  pcre_dfa_exec()  function returns a negative number when it fails.
+       Many of the errors are the same  as  for  pcre_exec(),  and  these  are
+       described  above.   There are in addition the following errors that are
        specific to pcre_dfa_exec():


          PCRE_ERROR_DFA_UITEM      (-16)


-       This  return is given if pcre_dfa_exec() encounters an item in the pat-
-       tern that it does not support, for instance, the use of \C  or  a  back
+       This return is given if pcre_dfa_exec() encounters an item in the  pat-
+       tern  that  it  does not support, for instance, the use of \C or a back
        reference.


          PCRE_ERROR_DFA_UCOND      (-17)


-       This  return  is  given  if pcre_dfa_exec() encounters a condition item
-       that uses a back reference for the condition, or a test  for  recursion
+       This return is given if pcre_dfa_exec()  encounters  a  condition  item
+       that  uses  a back reference for the condition, or a test for recursion
        in a specific group. These are not supported.


          PCRE_ERROR_DFA_UMLIMIT    (-18)


-       This  return  is given if pcre_dfa_exec() is called with an extra block
+       This return is given if pcre_dfa_exec() is called with an  extra  block
        that contains a setting of the match_limit field. This is not supported
        (it is meaningless).


          PCRE_ERROR_DFA_WSSIZE     (-19)


-       This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
+       This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
        workspace vector.


          PCRE_ERROR_DFA_RECURSE    (-20)


-       When a recursive subpattern is processed, the matching  function  calls
-       itself  recursively,  using  private vectors for ovector and workspace.
-       This error is given if the output vector  is  not  large  enough.  This
+       When  a  recursive subpattern is processed, the matching function calls
+       itself recursively, using private vectors for  ovector  and  workspace.
+       This  error  is  given  if  the output vector is not large enough. This
        should be extremely rare, as a vector of size 1000 is used.



SEE ALSO

-       pcrebuild(3),  pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar-
+       pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3),  pcrepar-
        tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).



@@ -2668,11 +2708,11 @@

REVISION

-       Last updated: 22 September 2009
+       Last updated: 03 October 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRECALLOUT(3)                                                  PCRECALLOUT(3)



@@ -2698,10 +2738,10 @@

          (?C1)abc(?C2)def


-       If  the  PCRE_AUTO_CALLOUT  option  bit  is  set when pcre_compile() is
-       called, PCRE automatically  inserts  callouts,  all  with  number  255,
-       before  each  item in the pattern. For example, if PCRE_AUTO_CALLOUT is
-       used with the pattern
+       If  the  PCRE_AUTO_CALLOUT  option  bit  is  set when pcre_compile() or
+       pcre_compile2() is called, PCRE  automatically  inserts  callouts,  all
+       with  number  255,  before  each  item  in the pattern. For example, if
+       PCRE_AUTO_CALLOUT is used with the pattern


          A(\d{2}|--)


@@ -2730,18 +2770,23 @@
        ever  start,  and  the  callout is never reached. However, with "abyd",
        though the result is still no match, the callout is obeyed.


-       You can disable these optimizations by passing the  PCRE_NO_START_OPTI-
-       MIZE  option  to  pcre_exec()  or  pcre_dfa_exec(). This slows down the
-       matching process, but does ensure that callouts  such  as  the  example
+       If the pattern is studied, PCRE knows the minimum length of a  matching
+       string,  and will immediately give a "no match" return without actually
+       running a match if the subject is not long enough, or,  for  unanchored
+       patterns, if it has been scanned far enough.
+
+       You  can disable these optimizations by passing the PCRE_NO_START_OPTI-
+       MIZE option to pcre_exec() or  pcre_dfa_exec().  This  slows  down  the
+       matching  process,  but  does  ensure that callouts such as the example
        above are obeyed.



THE CALLOUT INTERFACE

-       During  matching, when PCRE reaches a callout point, the external func-
-       tion defined by pcre_callout is called (if it is set). This applies  to
-       both  the  pcre_exec()  and the pcre_dfa_exec() matching functions. The
-       only argument to the callout function is a pointer  to  a  pcre_callout
+       During matching, when PCRE reaches a callout point, the external  func-
+       tion  defined by pcre_callout is called (if it is set). This applies to
+       both the pcre_exec() and the pcre_dfa_exec()  matching  functions.  The
+       only  argument  to  the callout function is a pointer to a pcre_callout
        block. This structure contains the following fields:


          int          version;
@@ -2757,81 +2802,81 @@
          int          pattern_position;
          int          next_item_length;


-       The  version  field  is an integer containing the version number of the
-       block format. The initial version was 0; the current version is 1.  The
-       version  number  will  change  again in future if additional fields are
+       The version field is an integer containing the version  number  of  the
+       block  format. The initial version was 0; the current version is 1. The
+       version number will change again in future  if  additional  fields  are
        added, but the intention is never to remove any of the existing fields.


-       The callout_number field contains the number of the  callout,  as  com-
-       piled  into  the pattern (that is, the number after ?C for manual call-
+       The  callout_number  field  contains the number of the callout, as com-
+       piled into the pattern (that is, the number after ?C for  manual  call-
        outs, and 255 for automatically generated callouts).


-       The offset_vector field is a pointer to the vector of offsets that  was
-       passed   by   the   caller  to  pcre_exec()  or  pcre_dfa_exec().  When
-       pcre_exec() is used, the contents can be inspected in order to  extract
-       substrings  that  have  been  matched  so  far,  in the same way as for
-       extracting substrings after a match has completed. For  pcre_dfa_exec()
+       The  offset_vector field is a pointer to the vector of offsets that was
+       passed  by  the  caller  to  pcre_exec()   or   pcre_dfa_exec().   When
+       pcre_exec()  is used, the contents can be inspected in order to extract
+       substrings that have been matched so  far,  in  the  same  way  as  for
+       extracting  substrings after a match has completed. For pcre_dfa_exec()
        this field is not useful.


        The subject and subject_length fields contain copies of the values that
        were passed to pcre_exec().


-       The start_match field normally contains the offset within  the  subject
-       at  which  the  current  match  attempt started. However, if the escape
-       sequence \K has been encountered, this value is changed to reflect  the
-       modified  starting  point.  If the pattern is not anchored, the callout
+       The  start_match  field normally contains the offset within the subject
+       at which the current match attempt  started.  However,  if  the  escape
+       sequence  \K has been encountered, this value is changed to reflect the
+       modified starting point. If the pattern is not  anchored,  the  callout
        function may be called several times from the same point in the pattern
        for different starting points in the subject.


-       The  current_position  field  contains the offset within the subject of
+       The current_position field contains the offset within  the  subject  of
        the current match pointer.


-       When the pcre_exec() function is used, the capture_top  field  contains
-       one  more than the number of the highest numbered captured substring so
-       far. If no substrings have been captured, the value of  capture_top  is
-       one.  This  is always the case when pcre_dfa_exec() is used, because it
+       When  the  pcre_exec() function is used, the capture_top field contains
+       one more than the number of the highest numbered captured substring  so
+       far.  If  no substrings have been captured, the value of capture_top is
+       one. This is always the case when pcre_dfa_exec() is used,  because  it
        does not support captured substrings.


-       The capture_last field contains the number of the  most  recently  cap-
-       tured  substring. If no substrings have been captured, its value is -1.
+       The  capture_last  field  contains the number of the most recently cap-
+       tured substring. If no substrings have been captured, its value is  -1.
        This is always the case when pcre_dfa_exec() is used.


-       The callout_data field contains a value that is passed  to  pcre_exec()
-       or  pcre_dfa_exec() specifically so that it can be passed back in call-
-       outs. It is passed in the pcre_callout field  of  the  pcre_extra  data
-       structure.  If  no such data was passed, the value of callout_data in a
-       pcre_callout block is NULL. There is a description  of  the  pcre_extra
+       The  callout_data  field contains a value that is passed to pcre_exec()
+       or pcre_dfa_exec() specifically so that it can be passed back in  call-
+       outs.  It  is  passed  in the pcre_callout field of the pcre_extra data
+       structure. If no such data was passed, the value of callout_data  in  a
+       pcre_callout  block  is  NULL. There is a description of the pcre_extra
        structure in the pcreapi documentation.


-       The  pattern_position field is present from version 1 of the pcre_call-
+       The pattern_position field is present from version 1 of the  pcre_call-
        out structure. It contains the offset to the next item to be matched in
        the pattern string.


-       The  next_item_length field is present from version 1 of the pcre_call-
+       The next_item_length field is present from version 1 of the  pcre_call-
        out structure. It contains the length of the next item to be matched in
-       the  pattern  string. When the callout immediately precedes an alterna-
-       tion bar, a closing parenthesis, or the end of the pattern, the  length
-       is  zero.  When the callout precedes an opening parenthesis, the length
+       the pattern string. When the callout immediately precedes  an  alterna-
+       tion  bar, a closing parenthesis, or the end of the pattern, the length
+       is zero. When the callout precedes an opening parenthesis,  the  length
        is that of the entire subpattern.


-       The pattern_position and next_item_length fields are intended  to  help
-       in  distinguishing between different automatic callouts, which all have
+       The  pattern_position  and next_item_length fields are intended to help
+       in distinguishing between different automatic callouts, which all  have
        the same callout number. However, they are set for all callouts.



RETURN VALUES

-       The external callout function returns an integer to PCRE. If the  value
-       is  zero,  matching  proceeds  as  normal. If the value is greater than
-       zero, matching fails at the current point, but  the  testing  of  other
+       The  external callout function returns an integer to PCRE. If the value
+       is zero, matching proceeds as normal. If  the  value  is  greater  than
+       zero,  matching  fails  at  the current point, but the testing of other
        matching possibilities goes ahead, just as if a lookahead assertion had
-       failed. If the value is less than zero, the  match  is  abandoned,  and
-       pcre_exec() (or pcre_dfa_exec()) returns the negative value.
+       failed.  If  the  value  is less than zero, the match is abandoned, and
+       pcre_exec() or pcre_dfa_exec() returns the negative value.


-       Negative   values   should   normally   be   chosen  from  the  set  of
+       Negative  values  should  normally  be   chosen   from   the   set   of
        PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan-
-       dard  "no  match"  failure.   The  error  number  PCRE_ERROR_CALLOUT is
-       reserved for use by callout functions; it will never be  used  by  PCRE
+       dard "no  match"  failure.   The  error  number  PCRE_ERROR_CALLOUT  is
+       reserved  for  use  by callout functions; it will never be used by PCRE
        itself.



@@ -2844,11 +2889,11 @@

REVISION

-       Last updated: 15 March 2009
+       Last updated: 29 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRECOMPAT(3)                                                    PCRECOMPAT(3)



@@ -2859,50 +2904,49 @@
DIFFERENCES BETWEEN PCRE AND PERL

        This  document describes the differences in the ways that PCRE and Perl
-       handle regular expressions. The differences described here  are  mainly
-       with  respect  to  Perl 5.8, though PCRE versions 7.0 and later contain
-       some features that are in Perl 5.10.
+       handle regular expressions. The differences  described  here  are  with
+       respect to Perl 5.10.


-       1. PCRE has only a subset of Perl's UTF-8 and Unicode support.  Details
-       of  what  it does have are given in the section on UTF-8 support in the
+       1.  PCRE has only a subset of Perl's UTF-8 and Unicode support. Details
+       of what it does have are given in the section on UTF-8 support  in  the
        main pcre page.


        2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl
-       permits  them,  but they do not mean what you might think. For example,
+       permits them, but they do not mean what you might think.  For  example,
        (?!a){3} does not assert that the next three characters are not "a". It
        just asserts that the next character is not "a" three times.


-       3.  Capturing  subpatterns  that occur inside negative lookahead asser-
-       tions are counted, but their entries in the offsets  vector  are  never
-       set.  Perl sets its numerical variables from any such patterns that are
+       3. Capturing subpatterns that occur inside  negative  lookahead  asser-
+       tions  are  counted,  but their entries in the offsets vector are never
+       set. Perl sets its numerical variables from any such patterns that  are
        matched before the assertion fails to match something (thereby succeed-
-       ing),  but  only  if the negative lookahead assertion contains just one
+       ing), but only if the negative lookahead assertion  contains  just  one
        branch.


-       4. Though binary zero characters are supported in the  subject  string,
+       4.  Though  binary zero characters are supported in the subject string,
        they are not allowed in a pattern string because it is passed as a nor-
        mal C string, terminated by zero. The escape sequence \0 can be used in
        the pattern to represent a binary zero.


-       5.  The  following Perl escape sequences are not supported: \l, \u, \L,
+       5. The following Perl escape sequences are not supported: \l,  \u,  \L,
        \U, and \N. In fact these are implemented by Perl's general string-han-
-       dling  and are not part of its pattern matching engine. If any of these
+       dling and are not part of its pattern matching engine. If any of  these
        are encountered by PCRE, an error is generated.


-       6. The Perl escape sequences \p, \P, and \X are supported only if  PCRE
-       is  built  with Unicode character property support. The properties that
-       can be tested with \p and \P are limited to the general category  prop-
-       erties  such  as  Lu and Nd, script names such as Greek or Han, and the
-       derived properties Any and L&. PCRE does  support  the  Cs  (surrogate)
-       property,  which  Perl  does  not; the Perl documentation says "Because
+       6.  The Perl escape sequences \p, \P, and \X are supported only if PCRE
+       is built with Unicode character property support. The  properties  that
+       can  be tested with \p and \P are limited to the general category prop-
+       erties such as Lu and Nd, script names such as Greek or  Han,  and  the
+       derived  properties  Any  and  L&. PCRE does support the Cs (surrogate)
+       property, which Perl does not; the  Perl  documentation  says  "Because
        Perl hides the need for the user to understand the internal representa-
-       tion  of Unicode characters, there is no need to implement the somewhat
+       tion of Unicode characters, there is no need to implement the  somewhat
        messy concept of surrogates."


        7. PCRE does support the \Q...\E escape for quoting substrings. Charac-
-       ters  in  between  are  treated as literals. This is slightly different
-       from Perl in that $ and @ are  also  handled  as  literals  inside  the
-       quotes.  In Perl, they cause variable interpolation (but of course PCRE
+       ters in between are treated as literals.  This  is  slightly  different
+       from  Perl  in  that  $  and  @ are also handled as literals inside the
+       quotes. In Perl, they cause variable interpolation (but of course  PCRE
        does not have variables). Note the following examples:


            Pattern            PCRE matches      Perl matches
@@ -2912,47 +2956,59 @@
            \Qabc\$xyz\E       abc\$xyz          abc\$xyz
            \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz


-       The \Q...\E sequence is recognized both inside  and  outside  character
+       The  \Q...\E  sequence  is recognized both inside and outside character
        classes.


        8. Fairly obviously, PCRE does not support the (?{code}) and (??{code})
-       constructions. However, there is support for recursive  patterns.  This
-       is  not  available  in Perl 5.8, but it is in Perl 5.10. Also, the PCRE
-       "callout" feature allows an external function to be called during  pat-
+       constructions.  However,  there is support for recursive patterns. This
+       is not available in Perl 5.8, but it is in Perl 5.10.  Also,  the  PCRE
+       "callout"  feature allows an external function to be called during pat-
        tern matching. See the pcrecallout documentation for details.


-       9.  Subpatterns  that  are  called  recursively or as "subroutines" are
-       always treated as atomic groups in  PCRE.  This  is  like  Python,  but
-       unlike  Perl. There is a discussion of an example that explains this in
-       more detail in the section on recursion differences from  Perl  in  the
-       pcrecompat page.
+       9. Subpatterns that are called  recursively  or  as  "subroutines"  are
+       always  treated  as  atomic  groups  in  PCRE. This is like Python, but
+       unlike Perl. There is a discussion of an example that explains this  in
+       more  detail  in  the section on recursion differences from Perl in the
+       pcrepattern page.


-       10.  There are some differences that are concerned with the settings of
-       captured strings when part of  a  pattern  is  repeated.  For  example,
-       matching  "aba"  against  the  pattern  /^(a(b)?)+$/  in Perl leaves $2
+       10. There are some differences that are concerned with the settings  of
+       captured  strings  when  part  of  a  pattern is repeated. For example,
+       matching "aba" against the  pattern  /^(a(b)?)+$/  in  Perl  leaves  $2
        unset, but in PCRE it is set to "b".


        11.  PCRE  does  support  Perl  5.10's  backtracking  verbs  (*ACCEPT),
-       (*FAIL),  (*F),  (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in
+       (*FAIL), (*F), (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but  only  in
        the forms without an argument. PCRE does not support (*MARK).


-       12. PCRE provides some extensions to the Perl regular expression facil-
-       ities.   Perl  5.10  will  include new features that are not in earlier
-       versions, some of which (such as named parentheses) have been  in  PCRE
-       for some time. This list is with respect to Perl 5.10:
+       12.  PCRE's handling of duplicate subpattern numbers and duplicate sub-
+       pattern names is not as general as Perl's. This is a consequence of the
+       fact the PCRE works internally just with numbers, using an external ta-
+       ble to translate between numbers and names. In  particular,  a  pattern
+       such  as  (?|(?<a>A)|(?<b)B),  where the two capturing parentheses have
+       the same number but different names, is not supported,  and  causes  an
+       error  at compile time. If it were allowed, it would not be possible to
+       distinguish which parentheses matched, because both names map  to  cap-
+       turing subpattern number 1. To avoid this confusing situation, an error
+       is given at compile time.


-       (a)  Although  lookbehind  assertions  must match fixed length strings,
-       each alternative branch of a lookbehind assertion can match a different
-       length of string. Perl requires them all to have the same length.
+       13. PCRE provides some extensions to the Perl regular expression facil-
+       ities.   Perl  5.10  includes new features that are not in earlier ver-
+       sions of Perl, some of which (such as named parentheses) have  been  in
+       PCRE for some time. This list is with respect to Perl 5.10:


-       (b)  If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
+       (a)  Although  lookbehind  assertions  in  PCRE must match fixed length
+       strings, each alternative branch of a lookbehind assertion can match  a
+       different  length  of  string.  Perl requires them all to have the same
+       length.
+
+       (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the  $
        meta-character matches only at the very end of the string.


        (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
        cial meaning is faulted. Otherwise, like Perl, the backslash is quietly
        ignored.  (Perl can be made to issue a warning.)


-       (d) If PCRE_UNGREEDY is set, the greediness of the  repetition  quanti-
+       (d)  If  PCRE_UNGREEDY is set, the greediness of the repetition quanti-
        fiers is inverted, that is, by default they are not greedy, but if fol-
        lowed by a question mark they are.


@@ -2960,10 +3016,10 @@
        tried only at the first matching position in the subject string.


        (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
-       and PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no  Perl  equiva-
+       and  PCRE_NO_AUTO_CAPTURE  options for pcre_exec() have no Perl equiva-
        lents.


-       (g)  The  \R escape sequence can be restricted to match only CR, LF, or
+       (g) The \R escape sequence can be restricted to match only CR,  LF,  or
        CRLF by the PCRE_BSR_ANYCRLF option.


        (h) The callout facility is PCRE-specific.
@@ -2973,10 +3029,10 @@
        (j) Patterns compiled by PCRE can be saved and re-used at a later time,
        even on different hosts that have the other endianness.


-       (k)  The  alternative  matching function (pcre_dfa_exec()) matches in a
+       (k) The alternative matching function (pcre_dfa_exec())  matches  in  a
        different way and is not Perl-compatible.


-       (l) PCRE recognizes some special sequences such as (*CR) at  the  start
+       (l)  PCRE  recognizes some special sequences such as (*CR) at the start
        of a pattern that set overall options that cannot be changed within the
        pattern.


@@ -2990,11 +3046,11 @@

REVISION

-       Last updated: 18 September 2009
+       Last updated: 04 October 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPATTERN(3)                                                  PCREPATTERN(3)



@@ -3021,9 +3077,9 @@

        The original operation of PCRE was on strings of  one-byte  characters.
        However,  there is now also support for UTF-8 character strings. To use
-       this, you must build PCRE to  include  UTF-8  support,  and  then  call
-       pcre_compile()  with  the  PCRE_UTF8  option.  There  is also a special
-       sequence that can be given at the start of a pattern:
+       this, PCRE must be built to include UTF-8 support, and  you  must  call
+       pcre_compile()  or  pcre_compile2() with the PCRE_UTF8 option. There is
+       also a special sequence that can be given at the start of a pattern:


          (*UTF8)


@@ -3061,9 +3117,9 @@
          (*ANYCRLF)   any of the three above
          (*ANY)       all Unicode newline sequences


-       These override the default and the options given to pcre_compile(). For
-       example, on a Unix system where LF is the default newline sequence, the
-       pattern
+       These  override  the default and the options given to pcre_compile() or
+       pcre_compile2(). For example, on a Unix system where LF is the  default
+       newline sequence, the pattern


          (*CR)a.b


@@ -3180,7 +3236,7 @@
        acters  in patterns in a visible manner. There is no restriction on the
        appearance of non-printing characters, apart from the binary zero  that
        terminates  a  pattern,  but  when  a pattern is being prepared by text
-       editing, it is usually easier  to  use  one  of  the  following  escape
+       editing, it is  often  easier  to  use  one  of  the  following  escape
        sequences than the binary character it represents:


          \a        alarm, that is, the BEL character (hex 07)
@@ -3392,13 +3448,13 @@
          (*BSR_ANYCRLF)   CR, LF, or CRLF only
          (*BSR_UNICODE)   any Unicode newline sequence


-       These override the default and the options given to pcre_compile(), but
-       they can be overridden by options given to pcre_exec(). Note that these
-       special settings, which are not Perl-compatible, are recognized only at
-       the  very  start  of a pattern, and that they must be in upper case. If
-       more than one of them is present, the last one is  used.  They  can  be
-       combined  with  a  change of newline convention, for example, a pattern
-       can start with:
+       These override the default and the options given to  pcre_compile()  or
+       pcre_compile2(),  but  they  can  be  overridden  by  options  given to
+       pcre_exec() or pcre_dfa_exec(). Note that these special settings, which
+       are  not  Perl-compatible,  are  recognized only at the very start of a
+       pattern, and that they must be in upper case. If more than one of  them
+       is present, the last one is used. They can be combined with a change of
+       newline convention, for example, a pattern can start with:


          (*ANY)(*BSR_ANYCRLF)


@@ -3581,34 +3637,37 @@
        A word boundary is a position in the subject string where  the  current
        character  and  the previous character do not both match \w or \W (i.e.
        one matches \w and the other matches \W), or the start or  end  of  the
-       string if the first or last character matches \w, respectively.
+       string if the first or last character matches \w, respectively. Neither
+       PCRE nor Perl has a separte "start of word" or "end  of  word"  metase-
+       quence.  However,  whatever follows \b normally determines which it is.
+       For example, the fragment \ba matches "a" at the start of a word.


-       The  \A,  \Z,  and \z assertions differ from the traditional circumflex
+       The \A, \Z, and \z assertions differ from  the  traditional  circumflex
        and dollar (described in the next section) in that they only ever match
-       at  the  very start and end of the subject string, whatever options are
-       set. Thus, they are independent of multiline mode. These  three  asser-
+       at the very start and end of the subject string, whatever  options  are
+       set.  Thus,  they are independent of multiline mode. These three asser-
        tions are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options, which
-       affect only the behaviour of the circumflex and dollar  metacharacters.
-       However,  if the startoffset argument of pcre_exec() is non-zero, indi-
+       affect  only the behaviour of the circumflex and dollar metacharacters.
+       However, if the startoffset argument of pcre_exec() is non-zero,  indi-
        cating that matching is to start at a point other than the beginning of
-       the  subject,  \A  can never match. The difference between \Z and \z is
+       the subject, \A can never match. The difference between \Z  and  \z  is
        that \Z matches before a newline at the end of the string as well as at
        the very end, whereas \z matches only at the end.


-       The  \G assertion is true only when the current matching position is at
-       the start point of the match, as specified by the startoffset  argument
-       of  pcre_exec().  It  differs  from \A when the value of startoffset is
-       non-zero. By calling pcre_exec() multiple times with appropriate  argu-
+       The \G assertion is true only when the current matching position is  at
+       the  start point of the match, as specified by the startoffset argument
+       of pcre_exec(). It differs from \A when the  value  of  startoffset  is
+       non-zero.  By calling pcre_exec() multiple times with appropriate argu-
        ments, you can mimic Perl's /g option, and it is in this kind of imple-
        mentation where \G can be useful.


-       Note, however, that PCRE's interpretation of \G, as the  start  of  the
+       Note,  however,  that  PCRE's interpretation of \G, as the start of the
        current match, is subtly different from Perl's, which defines it as the
-       end of the previous match. In Perl, these can  be  different  when  the
-       previously  matched  string was empty. Because PCRE does just one match
+       end  of  the  previous  match. In Perl, these can be different when the
+       previously matched string was empty. Because PCRE does just  one  match
        at a time, it cannot reproduce this behaviour.


-       If all the alternatives of a pattern begin with \G, the  expression  is
+       If  all  the alternatives of a pattern begin with \G, the expression is
        anchored to the starting match position, and the "anchored" flag is set
        in the compiled regular expression.


@@ -3616,90 +3675,90 @@
CIRCUMFLEX AND DOLLAR

        Outside a character class, in the default matching mode, the circumflex
-       character  is  an  assertion  that is true only if the current matching
-       point is at the start of the subject string. If the  startoffset  argu-
-       ment  of  pcre_exec()  is  non-zero,  circumflex can never match if the
-       PCRE_MULTILINE option is unset. Inside a  character  class,  circumflex
+       character is an assertion that is true only  if  the  current  matching
+       point  is  at the start of the subject string. If the startoffset argu-
+       ment of pcre_exec() is non-zero, circumflex  can  never  match  if  the
+       PCRE_MULTILINE  option  is  unset. Inside a character class, circumflex
        has an entirely different meaning (see below).


-       Circumflex  need  not be the first character of the pattern if a number
-       of alternatives are involved, but it should be the first thing in  each
-       alternative  in  which  it appears if the pattern is ever to match that
-       branch. If all possible alternatives start with a circumflex, that  is,
-       if  the  pattern  is constrained to match only at the start of the sub-
-       ject, it is said to be an "anchored" pattern.  (There  are  also  other
+       Circumflex need not be the first character of the pattern if  a  number
+       of  alternatives are involved, but it should be the first thing in each
+       alternative in which it appears if the pattern is ever  to  match  that
+       branch.  If all possible alternatives start with a circumflex, that is,
+       if the pattern is constrained to match only at the start  of  the  sub-
+       ject,  it  is  said  to be an "anchored" pattern. (There are also other
        constructs that can cause a pattern to be anchored.)


-       A  dollar  character  is  an assertion that is true only if the current
-       matching point is at the end of  the  subject  string,  or  immediately
+       A dollar character is an assertion that is true  only  if  the  current
+       matching  point  is  at  the  end of the subject string, or immediately
        before a newline at the end of the string (by default). Dollar need not
-       be the last character of the pattern if a number  of  alternatives  are
-       involved,  but  it  should  be  the last item in any branch in which it
+       be  the  last  character of the pattern if a number of alternatives are
+       involved, but it should be the last item in  any  branch  in  which  it
        appears. Dollar has no special meaning in a character class.


-       The meaning of dollar can be changed so that it  matches  only  at  the
-       very  end  of  the string, by setting the PCRE_DOLLAR_ENDONLY option at
+       The  meaning  of  dollar  can be changed so that it matches only at the
+       very end of the string, by setting the  PCRE_DOLLAR_ENDONLY  option  at
        compile time. This does not affect the \Z assertion.


        The meanings of the circumflex and dollar characters are changed if the
-       PCRE_MULTILINE  option  is  set.  When  this  is the case, a circumflex
-       matches immediately after internal newlines as well as at the start  of
-       the  subject  string.  It  does not match after a newline that ends the
-       string. A dollar matches before any newlines in the string, as well  as
-       at  the very end, when PCRE_MULTILINE is set. When newline is specified
-       as the two-character sequence CRLF, isolated CR and  LF  characters  do
+       PCRE_MULTILINE option is set. When  this  is  the  case,  a  circumflex
+       matches  immediately after internal newlines as well as at the start of
+       the subject string. It does not match after a  newline  that  ends  the
+       string.  A dollar matches before any newlines in the string, as well as
+       at the very end, when PCRE_MULTILINE is set. When newline is  specified
+       as  the  two-character  sequence CRLF, isolated CR and LF characters do
        not indicate newlines.


-       For  example, the pattern /^abc$/ matches the subject string "def\nabc"
-       (where \n represents a newline) in multiline mode, but  not  otherwise.
-       Consequently,  patterns  that  are anchored in single line mode because
-       all branches start with ^ are not anchored in  multiline  mode,  and  a
-       match  for  circumflex  is  possible  when  the startoffset argument of
-       pcre_exec() is non-zero. The PCRE_DOLLAR_ENDONLY option is  ignored  if
+       For example, the pattern /^abc$/ matches the subject string  "def\nabc"
+       (where  \n  represents a newline) in multiline mode, but not otherwise.
+       Consequently, patterns that are anchored in single  line  mode  because
+       all  branches  start  with  ^ are not anchored in multiline mode, and a
+       match for circumflex is  possible  when  the  startoffset  argument  of
+       pcre_exec()  is  non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if
        PCRE_MULTILINE is set.


-       Note  that  the sequences \A, \Z, and \z can be used to match the start
-       and end of the subject in both modes, and if all branches of a  pattern
-       start  with  \A it is always anchored, whether or not PCRE_MULTILINE is
+       Note that the sequences \A, \Z, and \z can be used to match  the  start
+       and  end of the subject in both modes, and if all branches of a pattern
+       start with \A it is always anchored, whether or not  PCRE_MULTILINE  is
        set.



FULL STOP (PERIOD, DOT)

        Outside a character class, a dot in the pattern matches any one charac-
-       ter  in  the subject string except (by default) a character that signi-
-       fies the end of a line. In UTF-8 mode, the  matched  character  may  be
+       ter in the subject string except (by default) a character  that  signi-
+       fies  the  end  of  a line. In UTF-8 mode, the matched character may be
        more than one byte long.


-       When  a line ending is defined as a single character, dot never matches
-       that character; when the two-character sequence CRLF is used, dot  does
-       not  match  CR  if  it  is immediately followed by LF, but otherwise it
-       matches all characters (including isolated CRs and LFs). When any  Uni-
-       code  line endings are being recognized, dot does not match CR or LF or
+       When a line ending is defined as a single character, dot never  matches
+       that  character; when the two-character sequence CRLF is used, dot does
+       not match CR if it is immediately followed  by  LF,  but  otherwise  it
+       matches  all characters (including isolated CRs and LFs). When any Uni-
+       code line endings are being recognized, dot does not match CR or LF  or
        any of the other line ending characters.


-       The behaviour of dot with regard to newlines can  be  changed.  If  the
-       PCRE_DOTALL  option  is  set,  a dot matches any one character, without
+       The  behaviour  of  dot  with regard to newlines can be changed. If the
+       PCRE_DOTALL option is set, a dot matches  any  one  character,  without
        exception. If the two-character sequence CRLF is present in the subject
        string, it takes two dots to match it.


-       The  handling of dot is entirely independent of the handling of circum-
-       flex and dollar, the only relationship being  that  they  both  involve
+       The handling of dot is entirely independent of the handling of  circum-
+       flex  and  dollar,  the  only relationship being that they both involve
        newlines. Dot has no special meaning in a character class.



MATCHING A SINGLE BYTE

        Outside a character class, the escape sequence \C matches any one byte,
-       both in and out of UTF-8 mode. Unlike a  dot,  it  always  matches  any
-       line-ending  characters.  The  feature  is provided in Perl in order to
-       match individual bytes in UTF-8 mode. Because it breaks up UTF-8  char-
-       acters  into individual bytes, what remains in the string may be a mal-
-       formed UTF-8 string. For this reason, the \C escape  sequence  is  best
+       both  in  and  out  of  UTF-8 mode. Unlike a dot, it always matches any
+       line-ending characters. The feature is provided in  Perl  in  order  to
+       match  individual bytes in UTF-8 mode. Because it breaks up UTF-8 char-
+       acters into individual bytes, what remains in the string may be a  mal-
+       formed  UTF-8  string.  For this reason, the \C escape sequence is best
        avoided.


-       PCRE  does  not  allow \C to appear in lookbehind assertions (described
-       below), because in UTF-8 mode this would make it impossible  to  calcu-
+       PCRE does not allow \C to appear in  lookbehind  assertions  (described
+       below),  because  in UTF-8 mode this would make it impossible to calcu-
        late the length of the lookbehind.



@@ -3707,97 +3766,99 @@

        An opening square bracket introduces a character class, terminated by a
        closing square bracket. A closing square bracket on its own is not spe-
-       cial. If a closing square bracket is required as a member of the class,
-       it should be the first data character in the class  (after  an  initial
-       circumflex, if present) or escaped with a backslash.
+       cial by default.  However, if the PCRE_JAVASCRIPT_COMPAT option is set,
+       a lone closing square bracket causes a compile-time error. If a closing
+       square bracket is required as a member of the class, it should  be  the
+       first  data  character  in  the  class (after an initial circumflex, if
+       present) or escaped with a backslash.


-       A  character  class matches a single character in the subject. In UTF-8
-       mode, the character may occupy more than one byte. A matched  character
+       A character class matches a single character in the subject.  In  UTF-8
+       mode, the character may be more than one byte long. A matched character
        must be in the set of characters defined by the class, unless the first
-       character in the class definition is a circumflex, in  which  case  the
-       subject  character  must  not  be in the set defined by the class. If a
-       circumflex is actually required as a member of the class, ensure it  is
+       character  in  the  class definition is a circumflex, in which case the
+       subject character must not be in the set defined by  the  class.  If  a
+       circumflex  is actually required as a member of the class, ensure it is
        not the first character, or escape it with a backslash.


-       For  example, the character class [aeiou] matches any lower case vowel,
-       while [^aeiou] matches any character that is not a  lower  case  vowel.
+       For example, the character class [aeiou] matches any lower case  vowel,
+       while  [^aeiou]  matches  any character that is not a lower case vowel.
        Note that a circumflex is just a convenient notation for specifying the
-       characters that are in the class by enumerating those that are  not.  A
-       class  that starts with a circumflex is not an assertion: it still con-
-       sumes a character from the subject string, and therefore  it  fails  if
+       characters  that  are in the class by enumerating those that are not. A
+       class that starts with a circumflex is not an assertion; it still  con-
+       sumes  a  character  from the subject string, and therefore it fails if
        the current pointer is at the end of the string.


-       In  UTF-8 mode, characters with values greater than 255 can be included
-       in a class as a literal string of bytes, or by using the  \x{  escaping
+       In UTF-8 mode, characters with values greater than 255 can be  included
+       in  a  class as a literal string of bytes, or by using the \x{ escaping
        mechanism.


-       When  caseless  matching  is set, any letters in a class represent both
-       their upper case and lower case versions, so for  example,  a  caseless
-       [aeiou]  matches  "A"  as well as "a", and a caseless [^aeiou] does not
-       match "A", whereas a caseful version would. In UTF-8 mode, PCRE  always
-       understands  the  concept  of case for characters whose values are less
-       than 128, so caseless matching is always possible. For characters  with
-       higher  values,  the  concept  of case is supported if PCRE is compiled
-       with Unicode property support, but not otherwise.  If you want  to  use
-       caseless  matching  for  characters 128 and above, you must ensure that
-       PCRE is compiled with Unicode property support as well  as  with  UTF-8
-       support.
+       When caseless matching is set, any letters in a  class  represent  both
+       their  upper  case  and lower case versions, so for example, a caseless
+       [aeiou] matches "A" as well as "a", and a caseless  [^aeiou]  does  not
+       match  "A", whereas a caseful version would. In UTF-8 mode, PCRE always
+       understands the concept of case for characters whose  values  are  less
+       than  128, so caseless matching is always possible. For characters with
+       higher values, the concept of case is supported  if  PCRE  is  compiled
+       with  Unicode  property support, but not otherwise.  If you want to use
+       caseless matching in UTF8-mode for characters 128 and above,  you  must
+       ensure  that  PCRE is compiled with Unicode property support as well as
+       with UTF-8 support.


-       Characters  that  might  indicate  line breaks are never treated in any
-       special way  when  matching  character  classes,  whatever  line-ending
-       sequence  is  in  use,  and  whatever  setting  of  the PCRE_DOTALL and
+       Characters that might indicate line breaks are  never  treated  in  any
+       special  way  when  matching  character  classes,  whatever line-ending
+       sequence is in  use,  and  whatever  setting  of  the  PCRE_DOTALL  and
        PCRE_MULTILINE options is used. A class such as [^a] always matches one
        of these characters.


-       The  minus (hyphen) character can be used to specify a range of charac-
-       ters in a character  class.  For  example,  [d-m]  matches  any  letter
-       between  d  and  m,  inclusive.  If  a minus character is required in a
-       class, it must be escaped with a backslash  or  appear  in  a  position
-       where  it cannot be interpreted as indicating a range, typically as the
+       The minus (hyphen) character can be used to specify a range of  charac-
+       ters  in  a  character  class.  For  example,  [d-m] matches any letter
+       between d and m, inclusive. If a  minus  character  is  required  in  a
+       class,  it  must  be  escaped  with a backslash or appear in a position
+       where it cannot be interpreted as indicating a range, typically as  the
        first or last character in the class.


        It is not possible to have the literal character "]" as the end charac-
-       ter  of a range. A pattern such as [W-]46] is interpreted as a class of
-       two characters ("W" and "-") followed by a literal string "46]", so  it
-       would  match  "W46]"  or  "-46]". However, if the "]" is escaped with a
-       backslash it is interpreted as the end of range, so [W-\]46] is  inter-
-       preted  as a class containing a range followed by two other characters.
-       The octal or hexadecimal representation of "]" can also be used to  end
+       ter of a range. A pattern such as [W-]46] is interpreted as a class  of
+       two  characters ("W" and "-") followed by a literal string "46]", so it
+       would match "W46]" or "-46]". However, if the "]"  is  escaped  with  a
+       backslash  it is interpreted as the end of range, so [W-\]46] is inter-
+       preted as a class containing a range followed by two other  characters.
+       The  octal or hexadecimal representation of "]" can also be used to end
        a range.


-       Ranges  operate in the collating sequence of character values. They can
-       also  be  used  for  characters  specified  numerically,  for   example
-       [\000-\037].  In UTF-8 mode, ranges can include characters whose values
+       Ranges operate in the collating sequence of character values. They  can
+       also   be  used  for  characters  specified  numerically,  for  example
+       [\000-\037]. In UTF-8 mode, ranges can include characters whose  values
        are greater than 255, for example [\x{100}-\x{2ff}].


        If a range that includes letters is used when caseless matching is set,
        it matches the letters in either case. For example, [W-c] is equivalent
-       to [][\\^_`wxyzabc], matched caselessly,  and  in  non-UTF-8  mode,  if
-       character  tables  for  a French locale are in use, [\xc8-\xcb] matches
-       accented E characters in both cases. In UTF-8 mode, PCRE  supports  the
-       concept  of  case for characters with values greater than 128 only when
+       to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if
+       character tables for a French locale are in  use,  [\xc8-\xcb]  matches
+       accented  E  characters in both cases. In UTF-8 mode, PCRE supports the
+       concept of case for characters with values greater than 128  only  when
        it is compiled with Unicode property support.


-       The character types \d, \D, \p, \P, \s, \S, \w, and \W may also  appear
-       in  a  character  class,  and add the characters that they match to the
+       The  character types \d, \D, \p, \P, \s, \S, \w, and \W may also appear
+       in a character class, and add the characters that  they  match  to  the
        class. For example, [\dABCDEF] matches any hexadecimal digit. A circum-
-       flex  can  conveniently  be used with the upper case character types to
-       specify a more restricted set of characters  than  the  matching  lower
-       case  type.  For example, the class [^\W_] matches any letter or digit,
+       flex can conveniently be used with the upper case  character  types  to
+       specify  a  more  restricted  set of characters than the matching lower
+       case type. For example, the class [^\W_] matches any letter  or  digit,
        but not underscore.


-       The only metacharacters that are recognized in  character  classes  are
-       backslash,  hyphen  (only  where  it can be interpreted as specifying a
-       range), circumflex (only at the start), opening  square  bracket  (only
-       when  it can be interpreted as introducing a POSIX class name - see the
-       next section), and the terminating  closing  square  bracket.  However,
+       The  only  metacharacters  that are recognized in character classes are
+       backslash, hyphen (only where it can be  interpreted  as  specifying  a
+       range),  circumflex  (only  at the start), opening square bracket (only
+       when it can be interpreted as introducing a POSIX class name - see  the
+       next  section),  and  the  terminating closing square bracket. However,
        escaping other non-alphanumeric characters does no harm.



POSIX CHARACTER CLASSES

        Perl supports the POSIX notation for character classes. This uses names
-       enclosed by [: and :] within the enclosing square brackets.  PCRE  also
+       enclosed  by  [: and :] within the enclosing square brackets. PCRE also
        supports this notation. For example,


          [01[:alpha:]%]
@@ -3820,18 +3881,18 @@
          word     "word" characters (same as \w)
          xdigit   hexadecimal digits


-       The "space" characters are HT (9), LF (10), VT (11), FF (12), CR  (13),
-       and  space  (32). Notice that this list includes the VT character (code
+       The  "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
+       and space (32). Notice that this list includes the VT  character  (code
        11). This makes "space" different to \s, which does not include VT (for
        Perl compatibility).


-       The  name  "word"  is  a Perl extension, and "blank" is a GNU extension
-       from Perl 5.8. Another Perl extension is negation, which  is  indicated
+       The name "word" is a Perl extension, and "blank"  is  a  GNU  extension
+       from  Perl  5.8. Another Perl extension is negation, which is indicated
        by a ^ character after the colon. For example,


          [12[:^digit:]]


-       matches  "1", "2", or any non-digit. PCRE (and Perl) also recognize the
+       matches "1", "2", or any non-digit. PCRE (and Perl) also recognize  the
        POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
        these are not supported, and an error is given if they are encountered.


@@ -3841,24 +3902,24 @@

VERTICAL BAR

-       Vertical bar characters are used to separate alternative patterns.  For
+       Vertical  bar characters are used to separate alternative patterns. For
        example, the pattern


          gilbert|sullivan


-       matches  either "gilbert" or "sullivan". Any number of alternatives may
-       appear, and an empty  alternative  is  permitted  (matching  the  empty
+       matches either "gilbert" or "sullivan". Any number of alternatives  may
+       appear,  and  an  empty  alternative  is  permitted (matching the empty
        string). The matching process tries each alternative in turn, from left
-       to right, and the first one that succeeds is used. If the  alternatives
-       are  within a subpattern (defined below), "succeeds" means matching the
+       to  right, and the first one that succeeds is used. If the alternatives
+       are within a subpattern (defined below), "succeeds" means matching  the
        rest of the main pattern as well as the alternative in the subpattern.



INTERNAL OPTION SETTING

-       The settings of the  PCRE_CASELESS,  PCRE_MULTILINE,  PCRE_DOTALL,  and
-       PCRE_EXTENDED  options  (which are Perl-compatible) can be changed from
-       within the pattern by  a  sequence  of  Perl  option  letters  enclosed
+       The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
+       PCRE_EXTENDED options (which are Perl-compatible) can be  changed  from
+       within  the  pattern  by  a  sequence  of  Perl option letters enclosed
        between "(?" and ")".  The option letters are


          i  for PCRE_CASELESS
@@ -3868,46 +3929,46 @@


        For example, (?im) sets caseless, multiline matching. It is also possi-
        ble to unset these options by preceding the letter with a hyphen, and a
-       combined  setting and unsetting such as (?im-sx), which sets PCRE_CASE-
-       LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and  PCRE_EXTENDED,
-       is  also  permitted.  If  a  letter  appears  both before and after the
+       combined setting and unsetting such as (?im-sx), which sets  PCRE_CASE-
+       LESS  and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
+       is also permitted. If a  letter  appears  both  before  and  after  the
        hyphen, the option is unset.


-       The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and  PCRE_EXTRA
-       can  be changed in the same way as the Perl-compatible options by using
+       The  PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA
+       can be changed in the same way as the Perl-compatible options by  using
        the characters J, U and X respectively.


-       When one of these option changes occurs at  top  level  (that  is,  not
-       inside  subpattern parentheses), the change applies to the remainder of
+       When  one  of  these  option  changes occurs at top level (that is, not
+       inside subpattern parentheses), the change applies to the remainder  of
        the pattern that follows. If the change is placed right at the start of
        a pattern, PCRE extracts it into the global options (and it will there-
        fore show up in data extracted by the pcre_fullinfo() function).


-       An option change within a subpattern (see below for  a  description  of
+       An  option  change  within a subpattern (see below for a description of
        subpatterns) affects only that part of the current pattern that follows
        it, so


          (a(?i)b)c


        matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
-       used).   By  this means, options can be made to have different settings
-       in different parts of the pattern. Any changes made in one  alternative
-       do  carry  on  into subsequent branches within the same subpattern. For
+       used).  By this means, options can be made to have  different  settings
+       in  different parts of the pattern. Any changes made in one alternative
+       do carry on into subsequent branches within the  same  subpattern.  For
        example,


          (a(?i)b|c)


-       matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the
-       first  branch  is  abandoned before the option setting. This is because
-       the effects of option settings happen at compile time. There  would  be
+       matches  "ab",  "aB",  "c",  and "C", even though when matching "C" the
+       first branch is abandoned before the option setting.  This  is  because
+       the  effects  of option settings happen at compile time. There would be
        some very weird behaviour otherwise.


-       Note:  There  are  other  PCRE-specific  options that can be set by the
-       application when the compile or match functions  are  called.  In  some
+       Note: There are other PCRE-specific options that  can  be  set  by  the
+       application  when  the  compile  or match functions are called. In some
        cases the pattern can contain special leading sequences such as (*CRLF)
-       to override what the application has set or what  has  been  defaulted.
-       Details  are  given  in the section entitled "Newline sequences" above.
-       There is also the (*UTF8) leading sequence that  can  be  used  to  set
+       to  override  what  the application has set or what has been defaulted.
+       Details are given in the section entitled  "Newline  sequences"  above.
+       There  is  also  the  (*UTF8)  leading sequence that can be used to set
        UTF-8 mode; this is equivalent to setting the PCRE_UTF8 option.



@@ -3920,18 +3981,18 @@

          cat(aract|erpillar|)


-       matches one of the words "cat", "cataract", or  "caterpillar".  Without
-       the  parentheses,  it  would  match  "cataract", "erpillar" or an empty
+       matches  one  of the words "cat", "cataract", or "caterpillar". Without
+       the parentheses, it would match  "cataract",  "erpillar"  or  an  empty
        string.


-       2. It sets up the subpattern as  a  capturing  subpattern.  This  means
-       that,  when  the  whole  pattern  matches,  that portion of the subject
+       2.  It  sets  up  the  subpattern as a capturing subpattern. This means
+       that, when the whole pattern  matches,  that  portion  of  the  subject
        string that matched the subpattern is passed back to the caller via the
-       ovector  argument  of pcre_exec(). Opening parentheses are counted from
-       left to right (starting from 1) to obtain  numbers  for  the  capturing
+       ovector argument of pcre_exec(). Opening parentheses are  counted  from
+       left  to  right  (starting  from 1) to obtain numbers for the capturing
        subpatterns.


-       For  example,  if the string "the red king" is matched against the pat-
+       For example, if the string "the red king" is matched against  the  pat-
        tern


          the ((red|white) (king|queen))
@@ -3939,12 +4000,12 @@
        the captured substrings are "red king", "red", and "king", and are num-
        bered 1, 2, and 3, respectively.


-       The  fact  that  plain  parentheses  fulfil two functions is not always
-       helpful.  There are often times when a grouping subpattern is  required
-       without  a capturing requirement. If an opening parenthesis is followed
-       by a question mark and a colon, the subpattern does not do any  captur-
-       ing,  and  is  not  counted when computing the number of any subsequent
-       capturing subpatterns. For example, if the string "the white queen"  is
+       The fact that plain parentheses fulfil  two  functions  is  not  always
+       helpful.   There are often times when a grouping subpattern is required
+       without a capturing requirement. If an opening parenthesis is  followed
+       by  a question mark and a colon, the subpattern does not do any captur-
+       ing, and is not counted when computing the  number  of  any  subsequent
+       capturing  subpatterns. For example, if the string "the white queen" is
        matched against the pattern


          the ((?:red|white) (king|queen))
@@ -3952,47 +4013,60 @@
        the captured substrings are "white queen" and "queen", and are numbered
        1 and 2. The maximum number of capturing subpatterns is 65535.


-       As a convenient shorthand, if any option settings are required  at  the
-       start  of  a  non-capturing  subpattern,  the option letters may appear
+       As  a  convenient shorthand, if any option settings are required at the
+       start of a non-capturing subpattern,  the  option  letters  may  appear
        between the "?" and the ":". Thus the two patterns


          (?i:saturday|sunday)
          (?:(?i)saturday|sunday)


        match exactly the same set of strings. Because alternative branches are
-       tried  from  left  to right, and options are not reset until the end of
-       the subpattern is reached, an option setting in one branch does  affect
-       subsequent  branches,  so  the above patterns match "SUNDAY" as well as
+       tried from left to right, and options are not reset until  the  end  of
+       the  subpattern is reached, an option setting in one branch does affect
+       subsequent branches, so the above patterns match "SUNDAY"  as  well  as
        "Saturday".



DUPLICATE SUBPATTERN NUMBERS

        Perl 5.10 introduced a feature whereby each alternative in a subpattern
-       uses  the same numbers for its capturing parentheses. Such a subpattern
-       starts with (?| and is itself a non-capturing subpattern. For  example,
+       uses the same numbers for its capturing parentheses. Such a  subpattern
+       starts  with (?| and is itself a non-capturing subpattern. For example,
        consider this pattern:


          (?|(Sat)ur|(Sun))day


-       Because  the two alternatives are inside a (?| group, both sets of cap-
-       turing parentheses are numbered one. Thus, when  the  pattern  matches,
-       you  can  look  at captured substring number one, whichever alternative
-       matched. This construct is useful when you want to  capture  part,  but
+       Because the two alternatives are inside a (?| group, both sets of  cap-
+       turing  parentheses  are  numbered one. Thus, when the pattern matches,
+       you can look at captured substring number  one,  whichever  alternative
+       matched.  This  construct  is useful when you want to capture part, but
        not all, of one of a number of alternatives. Inside a (?| group, paren-
-       theses are numbered as usual, but the number is reset at the  start  of
-       each  branch. The numbers of any capturing buffers that follow the sub-
-       pattern start after the highest number used in any branch. The  follow-
-       ing  example  is taken from the Perl documentation.  The numbers under-
+       theses  are  numbered as usual, but the number is reset at the start of
+       each branch. The numbers of any capturing buffers that follow the  sub-
+       pattern  start after the highest number used in any branch. The follow-
+       ing example is taken from the Perl documentation.  The  numbers  under-
        neath show in which buffer the captured content will be stored.


          # before  ---------------branch-reset----------- after
          / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
          # 1            2         2  3        2     3     4


-       A backreference or a recursive call to  a  numbered  subpattern  always
-       refers to the first one in the pattern with the given number.
+       A  backreference  to  a  numbered subpattern uses the most recent value
+       that is set for that number by any subpattern.  The  following  pattern
+       matches "abcabc" or "defdef":


+         /(?|(abc)|(def))\1/
+
+       In  contrast, a recursive or "subroutine" call to a numbered subpattern
+       always refers to the first one in the pattern with  the  given  number.
+       The following pattern matches "abcabc" or "defabc":
+
+         /(?|(abc)|(def))(?1)/
+
+       If  a condition test for a subpattern's having matched refers to a non-
+       unique number, the test is true if any of the subpatterns of that  num-
+       ber have matched.
+
        An  alternative approach to using this "branch reset" feature is to use
        duplicate named subpatterns, as described in the next section.


@@ -4006,26 +4080,29 @@
        patterns. This feature was not added to Perl until release 5.10. Python
        had  the  feature earlier, and PCRE introduced it at release 4.0, using
        the Python syntax. PCRE now supports both the Perl and the Python  syn-
-       tax.
+       tax.  Perl  allows  identically  numbered subpatterns to have different
+       names, but PCRE does not.


-       In  PCRE,  a subpattern can be named in one of three ways: (?<name>...)
-       or (?'name'...) as in Perl, or (?P<name>...) as in  Python.  References
+       In PCRE, a subpattern can be named in one of three  ways:  (?<name>...)
+       or  (?'name'...)  as in Perl, or (?P<name>...) as in Python. References
        to capturing parentheses from other parts of the pattern, such as back-
-       references, recursion, and conditions, can be made by name as  well  as
+       references,  recursion,  and conditions, can be made by name as well as
        by number.


-       Names  consist  of  up  to  32 alphanumeric characters and underscores.
-       Named capturing parentheses are still  allocated  numbers  as  well  as
-       names,  exactly as if the names were not present. The PCRE API provides
+       Names consist of up to  32  alphanumeric  characters  and  underscores.
+       Named  capturing  parentheses  are  still  allocated numbers as well as
+       names, exactly as if the names were not present. The PCRE API  provides
        function calls for extracting the name-to-number translation table from
        a compiled pattern. There is also a convenience function for extracting
        a captured substring by name.


-       By default, a name must be unique within a pattern, but it is  possible
+       By  default, a name must be unique within a pattern, but it is possible
        to relax this constraint by setting the PCRE_DUPNAMES option at compile
-       time. This can be useful for patterns where only one  instance  of  the
-       named  parentheses  can  match. Suppose you want to match the name of a
-       weekday, either as a 3-letter abbreviation or as the full name, and  in
+       time.  (Duplicate  names are also always permitted for subpatterns with
+       the same number, set up as described in the previous  section.)  Dupli-
+       cate  names  can  be useful for patterns where only one instance of the
+       named parentheses can match. Suppose you want to match the  name  of  a
+       weekday,  either as a 3-letter abbreviation or as the full name, and in
        both cases you want to extract the abbreviation. This pattern (ignoring
        the line breaks) does the job:


@@ -4035,26 +4112,38 @@
          (?<DN>Thu)(?:rsday)?|
          (?<DN>Sat)(?:urday)?


-       There are five capturing substrings, but only one is ever set  after  a
+       There  are  five capturing substrings, but only one is ever set after a
        match.  (An alternative way of solving this problem is to use a "branch
        reset" subpattern, as described in the previous section.)


-       The convenience function for extracting the data by  name  returns  the
-       substring  for  the first (and in this example, the only) subpattern of
-       that name that matched. This saves searching  to  find  which  numbered
-       subpattern  it  was. If you make a reference to a non-unique named sub-
-       pattern from elsewhere in the pattern, the one that corresponds to  the
-       lowest  number  is used. For further details of the interfaces for han-
-       dling named subpatterns, see the pcreapi documentation.
+       The  convenience  function  for extracting the data by name returns the
+       substring for the first (and in this example, the only)  subpattern  of
+       that  name  that  matched.  This saves searching to find which numbered
+       subpattern it was.


+       If you make a backreference to a non-unique named subpattern from else-
+       where  in the pattern, the one that corresponds to the first occurrence
+       of the name is used. In the absence of duplicate numbers (see the  pre-
+       vious  section)  this  is  the one with the lowest number. If you use a
+       named reference in a condition test (see the section  about  conditions
+       below),  either  to check whether a subpattern has matched, or to check
+       for recursion, all subpatterns with the same name are  tested.  If  the
+       condition  is  true for any one of them, the overall condition is true.
+       This is the same behaviour as testing by number. For further details of
+       the interfaces for handling named subpatterns, see the pcreapi documen-
+       tation.
+
        Warning: You cannot use different names to distinguish between two sub-
-       patterns  with  the same number (see the previous section) because PCRE
-       uses only the numbers when matching.
+       patterns  with  the same number because PCRE uses only the numbers when
+       matching. For this reason, an error is given at compile time if differ-
+       ent  names  are given to subpatterns with the same number. However, you
+       can give the same name to subpatterns with the same number,  even  when
+       PCRE_DUPNAMES is not set.



REPETITION

-       Repetition is specified by quantifiers, which can  follow  any  of  the
+       Repetition  is  specified  by  quantifiers, which can follow any of the
        following items:


          a literal data character
@@ -4066,18 +4155,19 @@
          a character class
          a back reference (see next section)
          a parenthesized subpattern (unless it is an assertion)
+         a recursive or "subroutine" call to a subpattern


-       The  general repetition quantifier specifies a minimum and maximum num-
-       ber of permitted matches, by giving the two numbers in  curly  brackets
-       (braces),  separated  by  a comma. The numbers must be less than 65536,
+       The general repetition quantifier specifies a minimum and maximum  num-
+       ber  of  permitted matches, by giving the two numbers in curly brackets
+       (braces), separated by a comma. The numbers must be  less  than  65536,
        and the first must be less than or equal to the second. For example:


          z{2,4}


-       matches "zz", "zzz", or "zzzz". A closing brace on its  own  is  not  a
-       special  character.  If  the second number is omitted, but the comma is
-       present, there is no upper limit; if the second number  and  the  comma
-       are  both omitted, the quantifier specifies an exact number of required
+       matches  "zz",  "zzz",  or  "zzzz". A closing brace on its own is not a
+       special character. If the second number is omitted, but  the  comma  is
+       present,  there  is  no upper limit; if the second number and the comma
+       are both omitted, the quantifier specifies an exact number of  required
        matches. Thus


          [aeiou]{3,}
@@ -4086,49 +4176,49 @@


          \d{8}


-       matches exactly 8 digits. An opening curly bracket that  appears  in  a
-       position  where a quantifier is not allowed, or one that does not match
-       the syntax of a quantifier, is taken as a literal character. For  exam-
+       matches  exactly  8  digits. An opening curly bracket that appears in a
+       position where a quantifier is not allowed, or one that does not  match
+       the  syntax of a quantifier, is taken as a literal character. For exam-
        ple, {,6} is not a quantifier, but a literal string of four characters.


-       In  UTF-8  mode,  quantifiers  apply to UTF-8 characters rather than to
+       In UTF-8 mode, quantifiers apply to UTF-8  characters  rather  than  to
        individual bytes. Thus, for example, \x{100}{2} matches two UTF-8 char-
        acters, each of which is represented by a two-byte sequence. Similarly,
        when Unicode property support is available, \X{3} matches three Unicode
-       extended  sequences,  each of which may be several bytes long (and they
+       extended sequences, each of which may be several bytes long  (and  they
        may be of different lengths).


        The quantifier {0} is permitted, causing the expression to behave as if
        the previous item and the quantifier were not present. This may be use-
-       ful for subpatterns that are referenced as subroutines  from  elsewhere
+       ful  for  subpatterns that are referenced as subroutines from elsewhere
        in the pattern. Items other than subpatterns that have a {0} quantifier
        are omitted from the compiled pattern.


-       For convenience, the three most common quantifiers have  single-charac-
+       For  convenience, the three most common quantifiers have single-charac-
        ter abbreviations:


          *    is equivalent to {0,}
          +    is equivalent to {1,}
          ?    is equivalent to {0,1}


-       It  is  possible  to construct infinite loops by following a subpattern
+       It is possible to construct infinite loops by  following  a  subpattern
        that can match no characters with a quantifier that has no upper limit,
        for example:


          (a?)*


        Earlier versions of Perl and PCRE used to give an error at compile time
-       for such patterns. However, because there are cases where this  can  be
-       useful,  such  patterns  are now accepted, but if any repetition of the
-       subpattern does in fact match no characters, the loop is forcibly  bro-
+       for  such  patterns. However, because there are cases where this can be
+       useful, such patterns are now accepted, but if any  repetition  of  the
+       subpattern  does in fact match no characters, the loop is forcibly bro-
        ken.


-       By  default,  the quantifiers are "greedy", that is, they match as much
-       as possible (up to the maximum  number  of  permitted  times),  without
-       causing  the  rest of the pattern to fail. The classic example of where
+       By default, the quantifiers are "greedy", that is, they match  as  much
+       as  possible  (up  to  the  maximum number of permitted times), without
+       causing the rest of the pattern to fail. The classic example  of  where
        this gives problems is in trying to match comments in C programs. These
-       appear  between  /*  and  */ and within the comment, individual * and /
-       characters may appear. An attempt to match C comments by  applying  the
+       appear between /* and */ and within the comment,  individual  *  and  /
+       characters  may  appear. An attempt to match C comments by applying the
        pattern


          /\*.*\*/
@@ -4137,19 +4227,19 @@


          /* first comment */  not comment  /* second comment */


-       fails,  because it matches the entire string owing to the greediness of
+       fails, because it matches the entire string owing to the greediness  of
        the .*  item.


-       However, if a quantifier is followed by a question mark, it  ceases  to
+       However,  if  a quantifier is followed by a question mark, it ceases to
        be greedy, and instead matches the minimum number of times possible, so
        the pattern


          /\*.*?\*/


-       does the right thing with the C comments. The meaning  of  the  various
-       quantifiers  is  not  otherwise  changed,  just the preferred number of
-       matches.  Do not confuse this use of question mark with its  use  as  a
-       quantifier  in its own right. Because it has two uses, it can sometimes
+       does  the  right  thing with the C comments. The meaning of the various
+       quantifiers is not otherwise changed,  just  the  preferred  number  of
+       matches.   Do  not  confuse this use of question mark with its use as a
+       quantifier in its own right. Because it has two uses, it can  sometimes
        appear doubled, as in


          \d??\d
@@ -4157,36 +4247,36 @@
        which matches one digit by preference, but can match two if that is the
        only way the rest of the pattern matches.


-       If  the PCRE_UNGREEDY option is set (an option that is not available in
-       Perl), the quantifiers are not greedy by default, but  individual  ones
-       can  be  made  greedy  by following them with a question mark. In other
+       If the PCRE_UNGREEDY option is set (an option that is not available  in
+       Perl),  the  quantifiers are not greedy by default, but individual ones
+       can be made greedy by following them with a  question  mark.  In  other
        words, it inverts the default behaviour.


-       When a parenthesized subpattern is quantified  with  a  minimum  repeat
-       count  that is greater than 1 or with a limited maximum, more memory is
-       required for the compiled pattern, in proportion to  the  size  of  the
+       When  a  parenthesized  subpattern  is quantified with a minimum repeat
+       count that is greater than 1 or with a limited maximum, more memory  is
+       required  for  the  compiled  pattern, in proportion to the size of the
        minimum or maximum.


        If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv-
-       alent to Perl's /s) is set, thus allowing the dot  to  match  newlines,
-       the  pattern  is  implicitly anchored, because whatever follows will be
-       tried against every character position in the subject string, so  there
-       is  no  point  in  retrying the overall match at any position after the
-       first. PCRE normally treats such a pattern as though it  were  preceded
+       alent  to  Perl's  /s) is set, thus allowing the dot to match newlines,
+       the pattern is implicitly anchored, because whatever  follows  will  be
+       tried  against every character position in the subject string, so there
+       is no point in retrying the overall match at  any  position  after  the
+       first.  PCRE  normally treats such a pattern as though it were preceded
        by \A.


-       In  cases  where  it  is known that the subject string contains no new-
-       lines, it is worth setting PCRE_DOTALL in order to  obtain  this  opti-
+       In cases where it is known that the subject  string  contains  no  new-
+       lines,  it  is  worth setting PCRE_DOTALL in order to obtain this opti-
        mization, or alternatively using ^ to indicate anchoring explicitly.


-       However,  there is one situation where the optimization cannot be used.
-       When .*  is inside capturing parentheses that  are  the  subject  of  a
-       backreference  elsewhere  in the pattern, a match at the start may fail
+       However, there is one situation where the optimization cannot be  used.
+       When  .*   is  inside  capturing  parentheses that are the subject of a
+       backreference elsewhere in the pattern, a match at the start  may  fail
        where a later one succeeds. Consider, for example:


          (.*)abc\1


-       If the subject is "xyz123abc123" the match point is the fourth  charac-
+       If  the subject is "xyz123abc123" the match point is the fourth charac-
        ter. For this reason, such a pattern is not implicitly anchored.


        When a capturing subpattern is repeated, the value captured is the sub-
@@ -4195,8 +4285,8 @@
          (tweedle[dume]{3}\s*)+


        has matched "tweedledum tweedledee" the value of the captured substring
-       is  "tweedledee".  However,  if there are nested capturing subpatterns,
-       the corresponding captured values may have been set in previous  itera-
+       is "tweedledee". However, if there are  nested  capturing  subpatterns,
+       the  corresponding captured values may have been set in previous itera-
        tions. For example, after


          /(a|(b))+/
@@ -4206,53 +4296,53 @@


ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS

-       With  both  maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
-       repetition, failure of what follows normally causes the  repeated  item
-       to  be  re-evaluated to see if a different number of repeats allows the
-       rest of the pattern to match. Sometimes it is useful to  prevent  this,
-       either  to  change the nature of the match, or to cause it fail earlier
-       than it otherwise might, when the author of the pattern knows there  is
+       With both maximizing ("greedy") and minimizing ("ungreedy"  or  "lazy")
+       repetition,  failure  of what follows normally causes the repeated item
+       to be re-evaluated to see if a different number of repeats  allows  the
+       rest  of  the pattern to match. Sometimes it is useful to prevent this,
+       either to change the nature of the match, or to cause it  fail  earlier
+       than  it otherwise might, when the author of the pattern knows there is
        no point in carrying on.


-       Consider,  for  example, the pattern \d+foo when applied to the subject
+       Consider, for example, the pattern \d+foo when applied to  the  subject
        line


          123456bar


        After matching all 6 digits and then failing to match "foo", the normal
-       action  of  the matcher is to try again with only 5 digits matching the
-       \d+ item, and then with  4,  and  so  on,  before  ultimately  failing.
-       "Atomic  grouping"  (a  term taken from Jeffrey Friedl's book) provides
-       the means for specifying that once a subpattern has matched, it is  not
+       action of the matcher is to try again with only 5 digits  matching  the
+       \d+  item,  and  then  with  4,  and  so on, before ultimately failing.
+       "Atomic grouping" (a term taken from Jeffrey  Friedl's  book)  provides
+       the  means for specifying that once a subpattern has matched, it is not
        to be re-evaluated in this way.


-       If  we  use atomic grouping for the previous example, the matcher gives
-       up immediately on failing to match "foo" the first time.  The  notation
+       If we use atomic grouping for the previous example, the  matcher  gives
+       up  immediately  on failing to match "foo" the first time. The notation
        is a kind of special parenthesis, starting with (?> as in this example:


          (?>\d+)foo


-       This  kind  of  parenthesis "locks up" the  part of the pattern it con-
-       tains once it has matched, and a failure further into  the  pattern  is
-       prevented  from  backtracking into it. Backtracking past it to previous
+       This kind of parenthesis "locks up" the  part of the  pattern  it  con-
+       tains  once  it  has matched, and a failure further into the pattern is
+       prevented from backtracking into it. Backtracking past it  to  previous
        items, however, works as normal.


-       An alternative description is that a subpattern of  this  type  matches
-       the  string  of  characters  that an identical standalone pattern would
+       An  alternative  description  is that a subpattern of this type matches
+       the string of characters that an  identical  standalone  pattern  would
        match, if anchored at the current point in the subject string.


        Atomic grouping subpatterns are not capturing subpatterns. Simple cases
        such as the above example can be thought of as a maximizing repeat that
-       must swallow everything it can. So, while both \d+ and  \d+?  are  pre-
-       pared  to  adjust  the number of digits they match in order to make the
+       must  swallow  everything  it can. So, while both \d+ and \d+? are pre-
+       pared to adjust the number of digits they match in order  to  make  the
        rest of the pattern match, (?>\d+) can only match an entire sequence of
        digits.


-       Atomic  groups in general can of course contain arbitrarily complicated
-       subpatterns, and can be nested. However, when  the  subpattern  for  an
+       Atomic groups in general can of course contain arbitrarily  complicated
+       subpatterns,  and  can  be  nested. However, when the subpattern for an
        atomic group is just a single repeated item, as in the example above, a
-       simpler notation, called a "possessive quantifier" can  be  used.  This
-       consists  of  an  additional  + character following a quantifier. Using
+       simpler  notation,  called  a "possessive quantifier" can be used. This
+       consists of an additional + character  following  a  quantifier.  Using
        this notation, the previous example can be rewritten as


          \d++foo
@@ -4262,45 +4352,45 @@


          (abc|xyz){2,3}+


-       Possessive   quantifiers   are   always  greedy;  the  setting  of  the
+       Possessive  quantifiers  are  always  greedy;  the   setting   of   the
        PCRE_UNGREEDY option is ignored. They are a convenient notation for the
-       simpler  forms  of atomic group. However, there is no difference in the
-       meaning of a possessive quantifier and  the  equivalent  atomic  group,
-       though  there  may  be a performance difference; possessive quantifiers
+       simpler forms of atomic group. However, there is no difference  in  the
+       meaning  of  a  possessive  quantifier and the equivalent atomic group,
+       though there may be a performance  difference;  possessive  quantifiers
        should be slightly faster.


-       The possessive quantifier syntax is an extension to the Perl  5.8  syn-
-       tax.   Jeffrey  Friedl  originated the idea (and the name) in the first
+       The  possessive  quantifier syntax is an extension to the Perl 5.8 syn-
+       tax.  Jeffrey Friedl originated the idea (and the name)  in  the  first
        edition of his book. Mike McCloskey liked it, so implemented it when he
-       built  Sun's Java package, and PCRE copied it from there. It ultimately
+       built Sun's Java package, and PCRE copied it from there. It  ultimately
        found its way into Perl at release 5.10.


        PCRE has an optimization that automatically "possessifies" certain sim-
-       ple  pattern  constructs.  For  example, the sequence A+B is treated as
-       A++B because there is no point in backtracking into a sequence  of  A's
+       ple pattern constructs. For example, the sequence  A+B  is  treated  as
+       A++B  because  there is no point in backtracking into a sequence of A's
        when B must follow.


-       When  a  pattern  contains an unlimited repeat inside a subpattern that
-       can itself be repeated an unlimited number of  times,  the  use  of  an
-       atomic  group  is  the  only way to avoid some failing matches taking a
+       When a pattern contains an unlimited repeat inside  a  subpattern  that
+       can  itself  be  repeated  an  unlimited number of times, the use of an
+       atomic group is the only way to avoid some  failing  matches  taking  a
        very long time indeed. The pattern


          (\D+|<\d+>)*[!?]


-       matches an unlimited number of substrings that either consist  of  non-
-       digits,  or  digits  enclosed in <>, followed by either ! or ?. When it
+       matches  an  unlimited number of substrings that either consist of non-
+       digits, or digits enclosed in <>, followed by either ! or  ?.  When  it
        matches, it runs quickly. However, if it is applied to


          aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa


-       it takes a long time before reporting  failure.  This  is  because  the
-       string  can be divided between the internal \D+ repeat and the external
-       * repeat in a large number of ways, and all  have  to  be  tried.  (The
-       example  uses  [!?]  rather than a single character at the end, because
-       both PCRE and Perl have an optimization that allows  for  fast  failure
-       when  a single character is used. They remember the last single charac-
-       ter that is required for a match, and fail early if it is  not  present
-       in  the  string.)  If  the pattern is changed so that it uses an atomic
+       it  takes  a  long  time  before reporting failure. This is because the
+       string can be divided between the internal \D+ repeat and the  external
+       *  repeat  in  a  large  number of ways, and all have to be tried. (The
+       example uses [!?] rather than a single character at  the  end,  because
+       both  PCRE  and  Perl have an optimization that allows for fast failure
+       when a single character is used. They remember the last single  charac-
+       ter  that  is required for a match, and fail early if it is not present
+       in the string.) If the pattern is changed so that  it  uses  an  atomic
        group, like this:


          ((?>\D+)|<\d+>)*[!?]
@@ -4312,37 +4402,37 @@


        Outside a character class, a backslash followed by a digit greater than
        0 (and possibly further digits) is a back reference to a capturing sub-
-       pattern earlier (that is, to its left) in the pattern,  provided  there
+       pattern  earlier  (that is, to its left) in the pattern, provided there
        have been that many previous capturing left parentheses.


        However, if the decimal number following the backslash is less than 10,
-       it is always taken as a back reference, and causes  an  error  only  if
-       there  are  not that many capturing left parentheses in the entire pat-
-       tern. In other words, the parentheses that are referenced need  not  be
-       to  the left of the reference for numbers less than 10. A "forward back
-       reference" of this type can make sense when a  repetition  is  involved
-       and  the  subpattern to the right has participated in an earlier itera-
+       it  is  always  taken  as a back reference, and causes an error only if
+       there are not that many capturing left parentheses in the  entire  pat-
+       tern.  In  other words, the parentheses that are referenced need not be
+       to the left of the reference for numbers less than 10. A "forward  back
+       reference"  of  this  type can make sense when a repetition is involved
+       and the subpattern to the right has participated in an  earlier  itera-
        tion.


-       It is not possible to have a numerical "forward back  reference"  to  a
-       subpattern  whose  number  is  10  or  more using this syntax because a
-       sequence such as \50 is interpreted as a character  defined  in  octal.
+       It  is  not  possible to have a numerical "forward back reference" to a
+       subpattern whose number is 10 or  more  using  this  syntax  because  a
+       sequence  such  as  \50 is interpreted as a character defined in octal.
        See the subsection entitled "Non-printing characters" above for further
-       details of the handling of digits following a backslash.  There  is  no
-       such  problem  when named parentheses are used. A back reference to any
+       details  of  the  handling of digits following a backslash. There is no
+       such problem when named parentheses are used. A back reference  to  any
        subpattern is possible using named parentheses (see below).


-       Another way of avoiding the ambiguity inherent in  the  use  of  digits
+       Another  way  of  avoiding  the ambiguity inherent in the use of digits
        following a backslash is to use the \g escape sequence, which is a fea-
-       ture introduced in Perl 5.10.  This  escape  must  be  followed  by  an
-       unsigned  number  or  a negative number, optionally enclosed in braces.
+       ture  introduced  in  Perl  5.10.  This  escape  must be followed by an
+       unsigned number or a negative number, optionally  enclosed  in  braces.
        These examples are all identical:


          (ring), \1
          (ring), \g1
          (ring), \g{1}


-       An unsigned number specifies an absolute reference without the  ambigu-
+       An  unsigned number specifies an absolute reference without the ambigu-
        ity that is present in the older syntax. It is also useful when literal
        digits follow the reference. A negative number is a relative reference.
        Consider this example:
@@ -4350,33 +4440,33 @@
          (abc(def)ghi)\g{-1}


        The sequence \g{-1} is a reference to the most recently started captur-
-       ing subpattern before \g, that is, is it equivalent to  \2.  Similarly,
+       ing  subpattern  before \g, that is, is it equivalent to \2. Similarly,
        \g{-2} would be equivalent to \1. The use of relative references can be
-       helpful in long patterns, and also in  patterns  that  are  created  by
+       helpful  in  long  patterns,  and  also in patterns that are created by
        joining together fragments that contain references within themselves.


-       A  back  reference matches whatever actually matched the capturing sub-
-       pattern in the current subject string, rather  than  anything  matching
+       A back reference matches whatever actually matched the  capturing  sub-
+       pattern  in  the  current subject string, rather than anything matching
        the subpattern itself (see "Subpatterns as subroutines" below for a way
        of doing that). So the pattern


          (sens|respons)e and \1ibility


-       matches "sense and sensibility" and "response and responsibility",  but
-       not  "sense and responsibility". If caseful matching is in force at the
-       time of the back reference, the case of letters is relevant. For  exam-
+       matches  "sense and sensibility" and "response and responsibility", but
+       not "sense and responsibility". If caseful matching is in force at  the
+       time  of the back reference, the case of letters is relevant. For exam-
        ple,


          ((?i)rah)\s+\1


-       matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the
+       matches "rah rah" and "RAH RAH", but not "RAH  rah",  even  though  the
        original capturing subpattern is matched caselessly.


-       There are several different ways of writing back  references  to  named
-       subpatterns.  The  .NET syntax \k{name} and the Perl syntax \k<name> or
-       \k'name' are supported, as is the Python syntax (?P=name). Perl  5.10's
+       There  are  several  different ways of writing back references to named
+       subpatterns. The .NET syntax \k{name} and the Perl syntax  \k<name>  or
+       \k'name'  are supported, as is the Python syntax (?P=name). Perl 5.10's
        unified back reference syntax, in which \g can be used for both numeric
-       and named references, is also supported. We  could  rewrite  the  above
+       and  named  references,  is  also supported. We could rewrite the above
        example in any of the following ways:


          (?<p1>(?i)rah)\s+\k<p1>
@@ -4384,23 +4474,26 @@
          (?P<p1>(?i)rah)\s+(?P=p1)
          (?<p1>(?i)rah)\s+\g{p1}


-       A  subpattern  that  is  referenced  by  name may appear in the pattern
+       A subpattern that is referenced by  name  may  appear  in  the  pattern
        before or after the reference.


-       There may be more than one back reference to the same subpattern. If  a
-       subpattern  has  not actually been used in a particular match, any back
-       references to it always fail. For example, the pattern
+       There  may be more than one back reference to the same subpattern. If a
+       subpattern has not actually been used in a particular match,  any  back
+       references to it always fail by default. For example, the pattern


          (a|(bc))\2


-       always fails if it starts to match "a" rather than "bc". Because  there
-       may  be  many  capturing parentheses in a pattern, all digits following
-       the backslash are taken as part of a potential back  reference  number.
-       If the pattern continues with a digit character, some delimiter must be
-       used to terminate the back reference. If the  PCRE_EXTENDED  option  is
-       set,  this  can  be  whitespace.  Otherwise an empty comment (see "Com-
-       ments" below) can be used.
+       always  fails  if  it starts to match "a" rather than "bc". However, if
+       the PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back refer-
+       ence to an unset value matches an empty string.


+       Because  there may be many capturing parentheses in a pattern, all dig-
+       its following a backslash are taken as part of a potential back  refer-
+       ence  number.   If  the  pattern continues with a digit character, some
+       delimiter must  be  used  to  terminate  the  back  reference.  If  the
+       PCRE_EXTENDED option is set, this can be whitespace. Otherwise, the \g{
+       syntax or an empty comment (see "Comments" below) can be used.
+
        A back reference that occurs inside the parentheses to which it  refers
        fails  when  the subpattern is first used, so, for example, (a\1) never
        matches.  However, such references can be useful inside  repeated  sub-
@@ -4462,19 +4555,20 @@
        If you want to force a matching failure at some point in a pattern, the
        most convenient way to do it is  with  (?!)  because  an  empty  string
        always  matches, so an assertion that requires there not to be an empty
-       string must always fail.
+       string must always fail.   The  Perl  5.10  backtracking  control  verb
+       (*FAIL) or (*F) is essentially a synonym for (?!).


    Lookbehind assertions


-       Lookbehind assertions start with (?<= for positive assertions and  (?<!
+       Lookbehind  assertions start with (?<= for positive assertions and (?<!
        for negative assertions. For example,


          (?<!foo)bar


-       does  find  an  occurrence  of "bar" that is not preceded by "foo". The
-       contents of a lookbehind assertion are restricted  such  that  all  the
+       does find an occurrence of "bar" that is not  preceded  by  "foo".  The
+       contents  of  a  lookbehind  assertion are restricted such that all the
        strings it matches must have a fixed length. However, if there are sev-
-       eral top-level alternatives, they do not all  have  to  have  the  same
+       eral  top-level  alternatives,  they  do  not all have to have the same
        fixed length. Thus


          (?<=bullock|donkey)
@@ -4483,62 +4577,62 @@


          (?<!dogs?|cats?)


-       causes  an  error at compile time. Branches that match different length
-       strings are permitted only at the top level of a lookbehind  assertion.
-       This  is an extension compared with Perl (5.8 and 5.10), which requires
+       causes an error at compile time. Branches that match  different  length
+       strings  are permitted only at the top level of a lookbehind assertion.
+       This is an extension compared with Perl (5.8 and 5.10), which  requires
        all branches to match the same length of string. An assertion such as


          (?<=ab(c|de))


-       is not permitted, because its single top-level  branch  can  match  two
+       is  not  permitted,  because  its single top-level branch can match two
        different lengths, but it is acceptable to PCRE if rewritten to use two
        top-level branches:


          (?<=abc|abde)


        In some cases, the Perl 5.10 escape sequence \K (see above) can be used
-       instead  of  a  lookbehind  assertion  to  get  round  the fixed-length
+       instead of  a  lookbehind  assertion  to  get  round  the  fixed-length
        restriction.


-       The implementation of lookbehind assertions is, for  each  alternative,
-       to  temporarily  move the current position back by the fixed length and
+       The  implementation  of lookbehind assertions is, for each alternative,
+       to temporarily move the current position back by the fixed  length  and
        then try to match. If there are insufficient characters before the cur-
        rent position, the assertion fails.


        PCRE does not allow the \C escape (which matches a single byte in UTF-8
-       mode) to appear in lookbehind assertions, because it makes it  impossi-
-       ble  to  calculate the length of the lookbehind. The \X and \R escapes,
+       mode)  to appear in lookbehind assertions, because it makes it impossi-
+       ble to calculate the length of the lookbehind. The \X and  \R  escapes,
        which can match different numbers of bytes, are also not permitted.


-       "Subroutine" calls (see below) such as (?2) or (?&X) are  permitted  in
-       lookbehinds,  as  long as the subpattern matches a fixed-length string.
+       "Subroutine"  calls  (see below) such as (?2) or (?&X) are permitted in
+       lookbehinds, as long as the subpattern matches a  fixed-length  string.
        Recursion, however, is not supported.


-       Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
-       assertions  to  specify  efficient  matching  at the end of the subject
-       string. Consider a simple pattern such as
+       Possessive  quantifiers  can  be  used  in  conjunction with lookbehind
+       assertions to specify efficient matching of fixed-length strings at the
+       end of subject strings. Consider a simple pattern such as


          abcd$


-       when applied to a long string that does  not  match.  Because  matching
+       when  applied  to  a  long string that does not match. Because matching
        proceeds from left to right, PCRE will look for each "a" in the subject
-       and then see if what follows matches the rest of the  pattern.  If  the
+       and  then  see  if what follows matches the rest of the pattern. If the
        pattern is specified as


          ^.*abcd$


-       the  initial .* matches the entire string at first, but when this fails
+       the initial .* matches the entire string at first, but when this  fails
        (because there is no following "a"), it backtracks to match all but the
-       last  character,  then all but the last two characters, and so on. Once
-       again the search for "a" covers the entire string, from right to  left,
+       last character, then all but the last two characters, and so  on.  Once
+       again  the search for "a" covers the entire string, from right to left,
        so we are no better off. However, if the pattern is written as


          ^.*+(?<=abcd)


-       there  can  be  no backtracking for the .*+ item; it can match only the
-       entire string. The subsequent lookbehind assertion does a  single  test
-       on  the last four characters. If it fails, the match fails immediately.
-       For long strings, this approach makes a significant difference  to  the
+       there can be no backtracking for the .*+ item; it can  match  only  the
+       entire  string.  The subsequent lookbehind assertion does a single test
+       on the last four characters. If it fails, the match fails  immediately.
+       For  long  strings, this approach makes a significant difference to the
        processing time.


    Using multiple assertions
@@ -4547,18 +4641,18 @@


          (?<=\d{3})(?<!999)foo


-       matches  "foo" preceded by three digits that are not "999". Notice that
-       each of the assertions is applied independently at the  same  point  in
-       the  subject  string.  First  there  is a check that the previous three
-       characters are all digits, and then there is  a  check  that  the  same
+       matches "foo" preceded by three digits that are not "999". Notice  that
+       each  of  the  assertions is applied independently at the same point in
+       the subject string. First there is a  check  that  the  previous  three
+       characters  are  all  digits,  and  then there is a check that the same
        three characters are not "999".  This pattern does not match "foo" pre-
-       ceded by six characters, the first of which are  digits  and  the  last
-       three  of  which  are not "999". For example, it doesn't match "123abc-
+       ceded  by  six  characters,  the first of which are digits and the last
+       three of which are not "999". For example, it  doesn't  match  "123abc-
        foo". A pattern to do that is


          (?<=\d{3}...)(?<!999)foo


-       This time the first assertion looks at the  preceding  six  characters,
+       This  time  the  first assertion looks at the preceding six characters,
        checking that the first three are digits, and then the second assertion
        checks that the preceding three characters are not "999".


@@ -4566,43 +4660,46 @@

          (?<=(?<!foo)bar)baz


-       matches an occurrence of "baz" that is preceded by "bar" which in  turn
+       matches  an occurrence of "baz" that is preceded by "bar" which in turn
        is not preceded by "foo", while


          (?<=\d{3}(?!999)...)foo


-       is  another pattern that matches "foo" preceded by three digits and any
+       is another pattern that matches "foo" preceded by three digits and  any
        three characters that are not "999".



CONDITIONAL SUBPATTERNS

-       It is possible to cause the matching process to obey a subpattern  con-
-       ditionally  or to choose between two alternative subpatterns, depending
-       on the result of an assertion, or whether a previous capturing  subpat-
-       tern  matched  or not. The two possible forms of conditional subpattern
-       are
+       It  is possible to cause the matching process to obey a subpattern con-
+       ditionally or to choose between two alternative subpatterns,  depending
+       on  the result of an assertion, or whether a specific capturing subpat-
+       tern has already been matched. The two possible  forms  of  conditional
+       subpattern are:


          (?(condition)yes-pattern)
          (?(condition)yes-pattern|no-pattern)


-       If the condition is satisfied, the yes-pattern is used;  otherwise  the
-       no-pattern  (if  present)  is used. If there are more than two alterna-
+       If  the  condition is satisfied, the yes-pattern is used; otherwise the
+       no-pattern (if present) is used. If there are more  than  two  alterna-
        tives in the subpattern, a compile-time error occurs.


-       There are four kinds of condition: references  to  subpatterns,  refer-
+       There  are  four  kinds of condition: references to subpatterns, refer-
        ences to recursion, a pseudo-condition called DEFINE, and assertions.


    Checking for a used subpattern by number


-       If  the  text between the parentheses consists of a sequence of digits,
-       the condition is true if the capturing subpattern of  that  number  has
-       previously  matched.  An  alternative notation is to precede the digits
-       with a plus or minus sign. In this case, the subpattern number is rela-
-       tive rather than absolute.  The most recently opened parentheses can be
-       referenced by (?(-1), the next most recent by (?(-2),  and  so  on.  In
-       looping constructs it can also make sense to refer to subsequent groups
-       with constructs such as (?(+2).
+       If the text between the parentheses consists of a sequence  of  digits,
+       the condition is true if a capturing subpattern of that number has pre-
+       viously matched. If there is more than one  capturing  subpattern  with
+       the  same  number  (see  the earlier section about duplicate subpattern
+       numbers), the condition is true if any of them have been set. An alter-
+       native  notation is to precede the digits with a plus or minus sign. In
+       this case, the subpattern number is relative rather than absolute.  The
+       most  recently opened parentheses can be referenced by (?(-1), the next
+       most recent by (?(-2), and so on. In looping  constructs  it  can  also
+       make  sense  to  refer  to  subsequent  groups  with constructs such as
+       (?(+2).


        Consider the following pattern, which  contains  non-significant  white
        space to make it more readable (assume the PCRE_EXTENDED option) and to
@@ -4645,6 +4742,9 @@


          (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )


+       If  the  name used in a condition of this kind is a duplicate, the test
+       is applied to all subpatterns of the same name, and is true if any  one
+       of them has matched.


    Checking for pattern recursion


@@ -4655,12 +4755,14 @@

          (?(R3)...) or (?(R&name)...)


-       the condition is true if the most recent recursion is into the  subpat-
-       tern  whose  number or name is given. This condition does not check the
-       entire recursion stack.
+       the condition is true if the most recent recursion is into a subpattern
+       whose number or name is given. This condition does not check the entire
+       recursion stack. If the name used in a condition  of  this  kind  is  a
+       duplicate, the test is applied to all subpatterns of the same name, and
+       is true if any one of them is the most recent recursion.


-       At "top level", all these recursion test conditions are false.   Recur-
-       sive patterns are described below.
+       At "top level", all these recursion test  conditions  are  false.   The
+       syntax for recursive patterns is described below.


    Defining subpatterns for use by reference only


@@ -4680,12 +4782,10 @@
        group  named "byte" is defined. This matches an individual component of
        an IPv4 address (a number less than 256). When  matching  takes  place,
        this  part  of  the pattern is skipped because DEFINE acts like a false
-       condition.
+       condition. The rest of the pattern uses references to the  named  group
+       to  match the four dot-separated components of an IPv4 address, insist-
+       ing on a word boundary at each end.


-       The rest of the pattern uses references to the named group to match the
-       four  dot-separated  components of an IPv4 address, insisting on a word
-       boundary at each end.
-
    Assertion conditions


        If the condition is not in any of the above  formats,  it  must  be  an
@@ -4752,24 +4852,26 @@
        This PCRE pattern solves the nested  parentheses  problem  (assume  the
        PCRE_EXTENDED option is set so that white space is ignored):


-         \( ( (?>[^()]+) | (?R) )* \)
+         \( ( [^()]++ | (?R) )* \)


        First  it matches an opening parenthesis. Then it matches any number of
        substrings which can either be a  sequence  of  non-parentheses,  or  a
        recursive  match  of the pattern itself (that is, a correctly parenthe-
-       sized substring).  Finally there is a closing parenthesis.
+       sized substring).  Finally there is a closing parenthesis. Note the use
+       of a possessive quantifier to avoid backtracking into sequences of non-
+       parentheses.


        If this were part of a larger pattern, you would not  want  to  recurse
        the entire pattern, so instead you could use this:


-         ( \( ( (?>[^()]+) | (?1) )* \) )
+         ( \( ( [^()]++ | (?1) )* \) )


        We  have  put the pattern into parentheses, and caused the recursion to
        refer to them instead of the whole pattern.


        In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
-       tricky.  This is made easier by the use of relative references. (A Perl
-       5.10 feature.)  Instead of (?1) in the  pattern  above  you  can  write
+       tricky.  This  is made easier by the use of relative references (a Perl
+       5.10 feature).  Instead of (?1) in the  pattern  above  you  can  write
        (?-2) to refer to the second most recently opened parentheses preceding
        the recursion. In other  words,  a  negative  number  counts  capturing
        parentheses leftwards from the point at which it is encountered.
@@ -4784,23 +4886,23 @@
        syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
        supported. We could rewrite the above example as follows:


-         (?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )
+         (?<pn> \( ( [^()]++ | (?&pn) )* \) )


        If  there  is more than one subpattern with the same name, the earliest
        one is used.


        This particular example pattern that we have been looking  at  contains
-       nested  unlimited repeats, and so the use of atomic grouping for match-
-       ing strings of non-parentheses is important when applying  the  pattern
-       to strings that do not match. For example, when this pattern is applied
-       to
+       nested unlimited repeats, and so the use of a possessive quantifier for
+       matching strings of non-parentheses is important when applying the pat-
+       tern  to  strings  that do not match. For example, when this pattern is
+       applied to


          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()


-       it yields "no match" quickly. However, if atomic grouping is not  used,
-       the  match  runs  for a very long time indeed because there are so many
-       different ways the + and * repeats can carve up the  subject,  and  all
-       have to be tested before failure can be reported.
+       it yields "no match" quickly. However, if a  possessive  quantifier  is
+       not  used, the match runs for a very long time indeed because there are
+       so many different ways the + and * repeats can carve  up  the  subject,
+       and all have to be tested before failure can be reported.


        At the end of a match, the values set for any capturing subpatterns are
        those from the outermost level of the recursion at which the subpattern
@@ -4814,7 +4916,7 @@
        value taken on at the top level. If additional parentheses  are  added,
        giving


-         \( ( ( (?>[^()]+) | (?R) )* ) \)
+         \( ( ( [^()]++ | (?R) )* ) \)
             ^                        ^
             ^                        ^


@@ -4894,7 +4996,7 @@
        If you want to match typical palindromic phrases, the  pattern  has  to
        ignore all non-word characters, which can be done like this:


-         ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+4|\W*+.\W*+))\W*+$
+         ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$


        If run with the PCRE_CASELESS option, this pattern matches phrases such
        as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
@@ -4903,7 +5005,15 @@
        great  deal  longer  (ten  times or more) to match typical phrases, and
        Perl takes so long that you think it has gone into a loop.


+       WARNING: The palindrome-matching patterns above work only if  the  sub-
+       ject  string  does not start with a palindrome that is shorter than the
+       entire string.  For example, although "abcba" is correctly matched,  if
+       the  subject  is "ababa", PCRE finds the palindrome "aba" at the start,
+       then fails at top level because the end of the string does not  follow.
+       Once  again, it cannot jump back into the recursion to try other alter-
+       natives, so the entire match fails.


+
SUBPATTERNS AS SUBROUTINES

        If the syntax for a recursive subpattern reference (either by number or
@@ -5034,8 +5144,8 @@


        This verb causes the match to end successfully, skipping the  remainder
        of  the pattern. When inside a recursion, only the innermost pattern is
-       ended immediately. If the (*ACCEPT) is  inside  capturing  parentheses,
-       the data so far is captured. (This feature was added to PCRE at release
+       ended immediately. If (*ACCEPT) is inside  capturing  parentheses,  the
+       data  so  far  is  captured. (This feature was added to PCRE at release
        8.00.) For example:


          A((?:A|B(*ACCEPT)|C)D)
@@ -5068,9 +5178,9 @@


        This verb causes the whole match to fail outright if the  rest  of  the
        pattern  does  not match. Even if the pattern is unanchored, no further
-       attempts to find a match by advancing the start point take place.  Once
-       (*COMMIT)  has been passed, pcre_exec() is committed to finding a match
-       at the current starting point, or not at all. For example:
+       attempts to find a match by advancing the starting  point  take  place.
+       Once  (*COMMIT)  has been passed, pcre_exec() is committed to finding a
+       match at the current starting point, or not at all. For example:


          a+(*COMMIT)b


@@ -5102,7 +5212,7 @@
        If  the  subject  is  "aaaac...",  after  the first match attempt fails
        (starting at the first character in the  string),  the  starting  point
        skips on to start the next attempt at "c". Note that a possessive quan-
-       tifer does not have the same effect in this example; although it  would
+       tifer does not have the same effect as this example; although it  would
        suppress  backtracking  during  the  first  match  attempt,  the second
        attempt would start at the second character instead of skipping  on  to
        "c".
@@ -5125,7 +5235,7 @@


SEE ALSO

-       pcreapi(3), pcrecallout(3), pcrematching(3), pcre(3).
+       pcreapi(3), pcrecallout(3), pcrematching(3), pcresyntax(3), pcre(3).



AUTHOR
@@ -5137,11 +5247,11 @@

REVISION

-       Last updated: 22 September 2009
+       Last updated: 04 October 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRESYNTAX(3)                                                    PCRESYNTAX(3)



@@ -5493,8 +5603,8 @@
        Last updated: 11 April 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPARTIAL(3)                                                  PCREPARTIAL(3)



@@ -5533,154 +5643,157 @@
        plete  match,  though the details differ between the two matching func-
        tions. If both options are set, PCRE_PARTIAL_HARD takes precedence.


-       Setting a partial matching option disables one of PCRE's optimizations.
+       Setting a partial matching option disables two of PCRE's optimizations.
        PCRE  remembers the last literal byte in a pattern, and abandons match-
        ing immediately if such a byte is not present in  the  subject  string.
        This  optimization cannot be used for a subject string that might match
-       only partially.
+       only partially. If the pattern was  studied,  PCRE  knows  the  minimum
+       length  of  a  matching string, and does not bother to run the matching
+       function on shorter strings. This optimization  is  also  disabled  for
+       partial matching.



PARTIAL MATCHING USING pcre_exec()

        A partial match occurs during a call to pcre_exec() whenever the end of
-       the  subject  string  is reached successfully, but matching cannot con-
+       the subject string is reached successfully, but  matching  cannot  con-
        tinue because more characters are needed. However, at least one charac-
-       ter  must have been matched. (In other words, a partial match can never
+       ter must have been matched. (In other words, a partial match can  never
        be an empty string.)


-       If PCRE_PARTIAL_SOFT is set,  the  partial  match  is  remembered,  but
+       If  PCRE_PARTIAL_SOFT  is  set,  the  partial  match is remembered, but
        matching continues as normal, and other alternatives in the pattern are
-       tried.  If  no  complete  match  can  be  found,  pcre_exec()   returns
+       tried.   If  no  complete  match  can  be  found,  pcre_exec()  returns
        PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. If there are at least
        two slots in the offsets vector, the first of them is set to the offset
        of the earliest character that was inspected when the partial match was
-       found. For convenience, the second offset points  to  the  end  of  the
-       string so that a substring can easily be extracted.
+       found.  For  convenience,  the  second  offset points to the end of the
+       string so that a substring can easily be identified.


-       For  the majority of patterns, the first offset identifies the start of
-       the partially matched string. However, for patterns that contain  look-
-       behind  assertions,  or  \K, or begin with \b or \B, earlier characters
+       For the majority of patterns, the first offset identifies the start  of
+       the  partially matched string. However, for patterns that contain look-
+       behind assertions, or \K, or begin with \b or  \B,  earlier  characters
        have been inspected while carrying out the match. For example:


          /(?<=abc)123/


        This pattern matches "123", but only if it is preceded by "abc". If the
        subject string is "xyzabc12", the offsets after a partial match are for
-       the substring "abc12", because  all  these  characters  are  needed  if
+       the  substring  "abc12",  because  all  these  characters are needed if
        another match is tried with extra characters added.


-       If  there  is more than one partial match, the first one that was found
+       If there is more than one partial match, the first one that  was  found
        provides the data that is returned. Consider this pattern:


          /123\w+X|dogY/


-       If this is matched against the subject string "abc123dog", both  alter-
-       natives  fail  to  match,  but the end of the subject is reached during
-       matching,   so    PCRE_ERROR_PARTIAL    is    returned    instead    of
-       PCRE_ERROR_NOMATCH.  The  offsets  are  set  to  3  and  9, identifying
-       "123dog" as the first partial match that was found. (In  this  example,
-       there  are  two  partial  matches,  because  "dog" on its own partially
+       If  this is matched against the subject string "abc123dog", both alter-
+       natives fail to match, but the end of the  subject  is  reached  during
+       matching,    so    PCRE_ERROR_PARTIAL    is    returned    instead   of
+       PCRE_ERROR_NOMATCH. The  offsets  are  set  to  3  and  9,  identifying
+       "123dog"  as  the first partial match that was found. (In this example,
+       there are two partial matches,  because  "dog"  on  its  own  partially
        matches the second alternative.)


        If PCRE_PARTIAL_HARD is set for pcre_exec(), it returns PCRE_ERROR_PAR-
-       TIAL  as soon as a partial match is found, without continuing to search
-       for possible complete matches. The difference between the  two  options
+       TIAL as soon as a partial match is found, without continuing to  search
+       for  possible  complete matches. The difference between the two options
        can be illustrated by a pattern such as:


          /dog(sbody)?/


-       This  matches either "dog" or "dogsbody", greedily (that is, it prefers
-       the longer string if possible). If it is  matched  against  the  string
-       "dog"  with  PCRE_PARTIAL_SOFT,  it  yields a complete match for "dog".
+       This matches either "dog" or "dogsbody", greedily (that is, it  prefers
+       the  longer  string  if  possible). If it is matched against the string
+       "dog" with PCRE_PARTIAL_SOFT, it yields a  complete  match  for  "dog".
        However, if PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL.
-       On  the  other hand, if the pattern is made ungreedy the result is dif-
+       On the other hand, if the pattern is made ungreedy the result  is  dif-
        ferent:


          /dog(sbody)??/


-       In this case the result is always a complete match because  pcre_exec()
-       finds  that  first,  and  it  never continues after finding a match. It
-       might be easier to follow this explanation by thinking of the two  pat-
+       In  this case the result is always a complete match because pcre_exec()
+       finds that first, and it never continues  after  finding  a  match.  It
+       might  be easier to follow this explanation by thinking of the two pat-
        terns like this:


          /dog(sbody)?/    is the same as  /dogsbody|dog/
          /dog(sbody)??/   is the same as  /dog|dogsbody/


-       The  second  pattern  will  never  match "dogsbody" when pcre_exec() is
+       The second pattern will never  match  "dogsbody"  when  pcre_exec()  is
        used, because it will always find the shorter match first.



PARTIAL MATCHING USING pcre_dfa_exec()

-       The pcre_dfa_exec() function moves along the subject  string  character
-       by  character, without backtracking, searching for all possible matches
-       simultaneously. If the end of the subject is reached before the end  of
-       the  pattern,  there  is the possibility of a partial match, again pro-
+       The  pcre_dfa_exec()  function moves along the subject string character
+       by character, without backtracking, searching for all possible  matches
+       simultaneously.  If the end of the subject is reached before the end of
+       the pattern, there is the possibility of a partial  match,  again  pro-
        vided that at least one character has matched.


-       When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned  only  if
-       there  have  been  no complete matches. Otherwise, the complete matches
-       are returned.  However, if PCRE_PARTIAL_HARD is set,  a  partial  match
-       takes  precedence  over any complete matches. The portion of the string
-       that was inspected when the longest partial match was found is  set  as
+       When  PCRE_PARTIAL_SOFT  is set, PCRE_ERROR_PARTIAL is returned only if
+       there have been no complete matches. Otherwise,  the  complete  matches
+       are  returned.   However,  if PCRE_PARTIAL_HARD is set, a partial match
+       takes precedence over any complete matches. The portion of  the  string
+       that  was  inspected when the longest partial match was found is set as
        the first matching string, provided there are at least two slots in the
        offsets vector.


-       Because pcre_dfa_exec() always searches for all possible  matches,  and
-       there  is no difference between greedy and ungreedy repetition, its be-
+       Because  pcre_dfa_exec()  always searches for all possible matches, and
+       there is no difference between greedy and ungreedy repetition, its  be-
        haviour is different from pcre_exec when PCRE_PARTIAL_HARD is set. Con-
-       sider  the  string  "dog"  matched  against  the ungreedy pattern shown
+       sider the string "dog"  matched  against  the  ungreedy  pattern  shown
        above:


          /dog(sbody)??/


-       Whereas pcre_exec() stops as soon as it finds the  complete  match  for
+       Whereas  pcre_exec()  stops  as soon as it finds the complete match for
        "dog", pcre_dfa_exec() also finds the partial match for "dogsbody", and
        so returns that when PCRE_PARTIAL_HARD is set.



PARTIAL MATCHING AND WORD BOUNDARIES

-       If a pattern ends with one of sequences \w or \W, which test  for  word
-       boundaries,  partial  matching with PCRE_PARTIAL_SOFT can give counter-
+       If  a  pattern ends with one of sequences \w or \W, which test for word
+       boundaries, partial matching with PCRE_PARTIAL_SOFT can  give  counter-
        intuitive results. Consider this pattern:


          /\bcat\b/


        This matches "cat", provided there is a word boundary at either end. If
        the subject string is "the cat", the comparison of the final "t" with a
-       following character cannot take place, so a  partial  match  is  found.
-       However,  pcre_exec() carries on with normal matching, which matches \b
-       at the end of the subject when the last character  is  a  letter,  thus
+       following  character  cannot  take  place, so a partial match is found.
+       However, pcre_exec() carries on with normal matching, which matches  \b
+       at  the  end  of  the subject when the last character is a letter, thus
        finding a complete match. The result, therefore, is not PCRE_ERROR_PAR-
-       TIAL. The same thing happens  with  pcre_dfa_exec(),  because  it  also
+       TIAL.  The  same  thing  happens  with pcre_dfa_exec(), because it also
        finds the complete match.


-       Using  PCRE_PARTIAL_HARD  in  this  case does yield PCRE_ERROR_PARTIAL,
+       Using PCRE_PARTIAL_HARD in this  case  does  yield  PCRE_ERROR_PARTIAL,
        because then the partial match takes precedence.



FORMERLY RESTRICTED PATTERNS

        For releases of PCRE prior to 8.00, because of the way certain internal
-       optimizations   were  implemented  in  the  pcre_exec()  function,  the
-       PCRE_PARTIAL option (predecessor of  PCRE_PARTIAL_SOFT)  could  not  be
-       used  with all patterns. From release 8.00 onwards, the restrictions no
-       longer apply, and partial matching with pcre_exec()  can  be  requested
+       optimizations  were  implemented  in  the  pcre_exec()  function,   the
+       PCRE_PARTIAL  option  (predecessor  of  PCRE_PARTIAL_SOFT) could not be
+       used with all patterns. From release 8.00 onwards, the restrictions  no
+       longer  apply,  and  partial matching with pcre_exec() can be requested
        for any pattern.


        Items that were formerly restricted were repeated single characters and
-       repeated metasequences. If PCRE_PARTIAL was set for a pattern that  did
-       not  conform  to  the restrictions, pcre_exec() returned the error code
-       PCRE_ERROR_BADPARTIAL (-13). This error code is no longer in  use.  The
-       PCRE_INFO_OKPARTIAL  call  to pcre_fullinfo() to find out if a compiled
+       repeated  metasequences. If PCRE_PARTIAL was set for a pattern that did
+       not conform to the restrictions, pcre_exec() returned  the  error  code
+       PCRE_ERROR_BADPARTIAL  (-13).  This error code is no longer in use. The
+       PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to find out if  a  compiled
        pattern can be used for partial matching now always returns 1.



EXAMPLE OF PARTIAL MATCHING USING PCRETEST

-       If the escape sequence \P is present  in  a  pcretest  data  line,  the
-       PCRE_PARTIAL_SOFT  option  is  used  for  the  match.  Here is a run of
+       If  the  escape  sequence  \P  is  present in a pcretest data line, the
+       PCRE_PARTIAL_SOFT option is used for  the  match.  Here  is  a  run  of
        pcretest that uses the date example quoted above:


            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -5696,24 +5809,24 @@
          data> j\P
          No match


-       The first data string is matched  completely,  so  pcretest  shows  the
-       matched  substrings.  The  remaining four strings do not match the com-
+       The  first  data  string  is  matched completely, so pcretest shows the
+       matched substrings. The remaining four strings do not  match  the  com-
        plete pattern, but the first two are partial matches. Similar output is
        obtained when pcre_dfa_exec() is used.


-       If  the escape sequence \P is present more than once in a pcretest data
+       If the escape sequence \P is present more than once in a pcretest  data
        line, the PCRE_PARTIAL_HARD option is set for the match.



MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()

        When a partial match has been found using pcre_dfa_exec(), it is possi-
-       ble  to  continue  the  match  by providing additional subject data and
-       calling pcre_dfa_exec() again with the same  compiled  regular  expres-
-       sion,  this time setting the PCRE_DFA_RESTART option. You must pass the
+       ble to continue the match by  providing  additional  subject  data  and
+       calling  pcre_dfa_exec()  again  with the same compiled regular expres-
+       sion, this time setting the PCRE_DFA_RESTART option. You must pass  the
        same working space as before, because this is where details of the pre-
-       vious  partial  match  are  stored.  Here is an example using pcretest,
-       using the \R escape sequence to set  the  PCRE_DFA_RESTART  option  (\D
+       vious partial match are stored. Here  is  an  example  using  pcretest,
+       using  the  \R  escape  sequence to set the PCRE_DFA_RESTART option (\D
        specifies the use of pcre_dfa_exec()):


            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -5722,26 +5835,26 @@
          data> n05\R\D
           0: n05


-       The  first  call has "23ja" as the subject, and requests partial match-
-       ing; the second call  has  "n05"  as  the  subject  for  the  continued
-       (restarted)  match.   Notice  that when the match is complete, only the
-       last part is shown; PCRE does  not  retain  the  previously  partially-
-       matched  string. It is up to the calling program to do that if it needs
+       The first call has "23ja" as the subject, and requests  partial  match-
+       ing;  the  second  call  has  "n05"  as  the  subject for the continued
+       (restarted) match.  Notice that when the match is  complete,  only  the
+       last  part  is  shown;  PCRE  does not retain the previously partially-
+       matched string. It is up to the calling program to do that if it  needs
        to.


-       You can set the PCRE_PARTIAL_SOFT  or  PCRE_PARTIAL_HARD  options  with
-       PCRE_DFA_RESTART  to  continue partial matching over multiple segments.
-       This facility can  be  used  to  pass  very  long  subject  strings  to
+       You  can  set  the  PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
+       PCRE_DFA_RESTART to continue partial matching over  multiple  segments.
+       This  facility  can  be  used  to  pass  very  long  subject strings to
        pcre_dfa_exec().



MULTI-SEGMENT MATCHING WITH pcre_exec()

-       From  release  8.00,  pcre_exec()  can also be used to do multi-segment
-       matching. Unlike pcre_dfa_exec(), it is not  possible  to  restart  the
-       previous  match  with  a new segment of data. Instead, new data must be
-       added to the previous subject string,  and  the  entire  match  re-run,
-       starting  from the point where the partial match occurred. Earlier data
+       From release 8.00, pcre_exec() can also be  used  to  do  multi-segment
+       matching.  Unlike  pcre_dfa_exec(),  it  is not possible to restart the
+       previous match with a new segment of data. Instead, new  data  must  be
+       added  to  the  previous  subject  string, and the entire match re-run,
+       starting from the point where the partial match occurred. Earlier  data
        can be discarded.  Consider an unanchored pattern that matches dates:


            re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
@@ -5749,15 +5862,15 @@
          Partial match: 23ja


        The this stage, an application could discard the text preceding "23ja",
-       add  on  text from the next segment, and call pcre_exec() again. Unlike
-       pcre_dfa_exec(), the entire matching string must always  be  available,
-       and  the complete matching process occurs for each call, so more memory
+       add on text from the next segment, and call pcre_exec()  again.  Unlike
+       pcre_dfa_exec(),  the  entire matching string must always be available,
+       and the complete matching process occurs for each call, so more  memory
        and more processing time is needed.


-       Note: If the pattern contains lookbehind assertions, or \K,  or  starts
-       with  \b  or  \B,  the string that is returned for a partial match will
-       include characters that precede the partially  matched  string  itself,
-       because  these  must  be  retained when adding on more characters for a
+       Note:  If  the pattern contains lookbehind assertions, or \K, or starts
+       with \b or \B, the string that is returned for  a  partial  match  will
+       include  characters  that  precede the partially matched string itself,
+       because these must be retained when adding on  more  characters  for  a
        subsequent matching attempt.



@@ -5766,28 +5879,28 @@
        Certain types of pattern may give problems with multi-segment matching,
        whichever matching function is used.


-       1.  If  the  pattern contains tests for the beginning or end of a line,
-       you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options,  as  appropri-
-       ate,  when  the subject string for any call does not contain the begin-
+       1. If the pattern contains tests for the beginning or end  of  a  line,
+       you  need  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropri-
+       ate, when the subject string for any call does not contain  the  begin-
        ning or end of a line.


-       2. Lookbehind assertions at the start of a pattern are catered  for  in
-       the  offsets that are returned for a partial match. However, in theory,
-       a lookbehind assertion later in the pattern could require even  earlier
-       characters  to  be inspected, and it might not have been reached when a
-       partial match occurs. This is probably an extremely unlikely case;  you
-       could  guard  against  it to a certain extent by always including extra
+       2.  Lookbehind  assertions at the start of a pattern are catered for in
+       the offsets that are returned for a partial match. However, in  theory,
+       a  lookbehind assertion later in the pattern could require even earlier
+       characters to be inspected, and it might not have been reached  when  a
+       partial  match occurs. This is probably an extremely unlikely case; you
+       could guard against it to a certain extent by  always  including  extra
        characters at the start.


-       3. Matching a subject string that is split into multiple  segments  may
-       not  always produce exactly the same result as matching over one single
-       long string, especially when PCRE_PARTIAL_SOFT  is  used.  The  section
-       "Partial  Matching  and  Word Boundaries" above describes an issue that
-       arises if the pattern ends with \b or \B. Another  kind  of  difference
-       may  occur  when  there  are multiple matching possibilities, because a
+       3.  Matching  a subject string that is split into multiple segments may
+       not always produce exactly the same result as matching over one  single
+       long  string,  especially  when  PCRE_PARTIAL_SOFT is used. The section
+       "Partial Matching and Word Boundaries" above describes  an  issue  that
+       arises  if  the  pattern ends with \b or \B. Another kind of difference
+       may occur when there are multiple  matching  possibilities,  because  a
        partial match result is given only when there are no completed matches.
        This means that as soon as the shortest match has been found, continua-
-       tion to a new subject segment is no longer  possible.   Consider  again
+       tion  to  a  new subject segment is no longer possible.  Consider again
        this pcretest example:


            re> /dog(sbody)?/
@@ -5801,17 +5914,17 @@
           0: dogsbody
           1: dog


-       The  first  data line passes the string "dogsb" to pcre_exec(), setting
-       the PCRE_PARTIAL_SOFT option. Although the string is  a  partial  match
-       for  "dogsbody",  the  result  is  not  PCRE_ERROR_PARTIAL, because the
-       shorter string "dog" is a complete match. Similarly, when  the  subject
-       is  presented to pcre_dfa_exec() in several parts ("do" and "gsb" being
+       The first data line passes the string "dogsb" to  pcre_exec(),  setting
+       the  PCRE_PARTIAL_SOFT  option.  Although the string is a partial match
+       for "dogsbody", the  result  is  not  PCRE_ERROR_PARTIAL,  because  the
+       shorter  string  "dog" is a complete match. Similarly, when the subject
+       is presented to pcre_dfa_exec() in several parts ("do" and "gsb"  being
        the first two) the match stops when "dog" has been found, and it is not
-       possible  to continue. On the other hand, if "dogsbody" is presented as
+       possible to continue. On the other hand, if "dogsbody" is presented  as
        a single string, pcre_dfa_exec() finds both matches.


        Because of these problems, it is probably best to use PCRE_PARTIAL_HARD
-       when  matching  multi-segment data. The example above then behaves dif-
+       when matching multi-segment data. The example above then  behaves  dif-
        ferently:


            re> /dog(sbody)?/
@@ -5824,25 +5937,25 @@



        4. Patterns that contain alternatives at the top level which do not all
-       start  with  the  same  pattern  item  may  not  work  as expected when
+       start with the  same  pattern  item  may  not  work  as  expected  when
        pcre_dfa_exec() is used. For example, consider this pattern:


          1234|3789


-       If the first part of the subject is "ABC123", a partial  match  of  the
-       first  alternative  is found at offset 3. There is no partial match for
+       If  the  first  part of the subject is "ABC123", a partial match of the
+       first alternative is found at offset 3. There is no partial  match  for
        the second alternative, because such a match does not start at the same
-       point  in  the  subject  string. Attempting to continue with the string
-       "7890" does not yield a match  because  only  those  alternatives  that
-       match  at  one  point in the subject are remembered. The problem arises
-       because the start of the second alternative matches  within  the  first
-       alternative.  There  is  no  problem with anchored patterns or patterns
+       point in the subject string. Attempting to  continue  with  the  string
+       "7890"  does  not  yield  a  match because only those alternatives that
+       match at one point in the subject are remembered.  The  problem  arises
+       because  the  start  of the second alternative matches within the first
+       alternative. There is no problem with  anchored  patterns  or  patterns
        such as:


          1234|ABCD


-       where no string can be a partial match for both alternatives.  This  is
-       not  a  problem if pcre_exec() is used, because the entire match has to
+       where  no  string can be a partial match for both alternatives. This is
+       not a problem if pcre_exec() is used, because the entire match  has  to
        be rerun each time:


            re> /1234|3789/
@@ -5861,11 +5974,11 @@


REVISION

-       Last updated: 05 September 2009
+       Last updated: 29 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)



@@ -5988,8 +6101,8 @@
        Last updated: 13 June 2007
        Copyright (c) 1997-2007 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPERFORM(3)                                                  PCREPERFORM(3)



@@ -6138,8 +6251,8 @@
        Last updated: 06 March 2007
        Copyright (c) 1997-2007 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPOSIX(3)                                                      PCREPOSIX(3)



@@ -6394,8 +6507,8 @@
        Last updated: 02 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRECPP(3)                                                          PCRECPP(3)



@@ -6735,8 +6848,8 @@

        Last updated: 17 March 2009
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRESAMPLE(3)                                                    PCRESAMPLE(3)



@@ -6765,8 +6878,8 @@
        is going on.


        If  PCRE  is  installed in the standard include and library directories
-       for your system, you should be able to compile the  demonstration  pro-
-       gram using this command:
+       for your operating system, you should be able to compile the demonstra-
+       tion program using this command:


          gcc -o pcredemo pcredemo.c -lpcre


@@ -6813,7 +6926,7 @@

REVISION

-       Last updated: 01 September 2009
+       Last updated: 30 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
 PCRESTACK(3)                                                      PCRESTACK(3)
@@ -6952,5 +7065,5 @@
        Last updated: 09 July 2008
        Copyright (c) 1997-2008 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+


Modified: code/trunk/doc/pcre_compile2.3
===================================================================
--- code/trunk/doc/pcre_compile2.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcre_compile2.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -49,7 +49,7 @@
   PCRE_JAVASCRIPT_COMPAT  JavaScript compatibility
   PCRE_MULTILINE          ^ and $ match newlines within data
   PCRE_NEWLINE_ANY        Recognize any Unicode newline sequence
-  PCRE_NEWLINE_ANYCRLF    Recognize CR, LF, and CRLF as newline 
+  PCRE_NEWLINE_ANYCRLF    Recognize CR, LF, and CRLF as newline
                             sequences
   PCRE_NEWLINE_CR         Set CR as the newline sequence
   PCRE_NEWLINE_CRLF       Set CRLF as the newline sequence


Modified: code/trunk/doc/pcre_dfa_exec.3
===================================================================
--- code/trunk/doc/pcre_dfa_exec.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcre_dfa_exec.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -57,8 +57,8 @@
                            was set at compile time)
   PCRE_PARTIAL           ) Return PCRE_ERROR_PARTIAL for a partial
   PCRE_PARTIAL_SOFT      )   match if no full matches are found
-  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match 
-                           even if there is a full match as well 
+  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match
+                           even if there is a full match as well
   PCRE_DFA_SHORTEST      Return only the shortest match
   PCRE_DFA_RESTART       Restart after a partial match
 .sp


Modified: code/trunk/doc/pcre_exec.3
===================================================================
--- code/trunk/doc/pcre_exec.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcre_exec.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -52,8 +52,8 @@
                            was set at compile time)
   PCRE_PARTIAL           ) Return PCRE_ERROR_PARTIAL for a partial
   PCRE_PARTIAL_SOFT      )   match if no full matches are found
-  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match 
-                           even if there is a full match as well 
+  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match
+                           even if there is a full match as well
 .sp
 For details of partial matching, see the
 .\" HREF


Modified: code/trunk/doc/pcre_fullinfo.3
===================================================================
--- code/trunk/doc/pcre_fullinfo.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcre_fullinfo.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -33,7 +33,7 @@
   PCRE_INFO_FIRSTTABLE      Table of first bytes (after studying)
   PCRE_INFO_JCHANGED        Return 1 if (?J) or (?-J) was used
   PCRE_INFO_LASTLITERAL     Literal last byte required
-  PCRE_INFO_MINLENGTH       Lower bound length of matching strings 
+  PCRE_INFO_MINLENGTH       Lower bound length of matching strings
   PCRE_INFO_NAMECOUNT       Number of named subpatterns
   PCRE_INFO_NAMEENTRYSIZE   Size of name table entry
   PCRE_INFO_NAMETABLE       Pointer to name table


Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcreapi.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -395,8 +395,8 @@
 Either of the functions \fBpcre_compile()\fP or \fBpcre_compile2()\fP can be
 called to compile a pattern into an internal form. The only difference between
 the two interfaces is that \fBpcre_compile2()\fP has an additional argument,
-\fIerrorcodeptr\fP, via which a numerical error code can be returned. To avoid 
-too much repetition, we refer just to \fBpcre_compile()\fP below, but the 
+\fIerrorcodeptr\fP, via which a numerical error code can be returned. To avoid
+too much repetition, we refer just to \fBpcre_compile()\fP below, but the
 information applies equally to \fBpcre_compile2()\fP.
 .P
 The pattern is a C string terminated by a binary zero, and is passed in the
@@ -421,7 +421,7 @@
 .\"
 documentation). For those options that can be different in different parts of
 the pattern, the contents of the \fIoptions\fP argument specifies their
-settings at the start of compilation and execution. The PCRE_ANCHORED, 
+settings at the start of compilation and execution. The PCRE_ANCHORED,
 PCRE_BSR_\fIxxx\fP, and PCRE_NEWLINE_\fIxxx\fP options can be set at the time
 of matching as well as at compile time.
 .P
@@ -785,7 +785,7 @@
 .P
 If studying the pattern does not produce any useful information,
 \fBpcre_study()\fP returns NULL. In that circumstance, if the calling program
-wants to pass any of the other fields to \fBpcre_exec()\fP or 
+wants to pass any of the other fields to \fBpcre_exec()\fP or
 \fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block.
 .P
 The second argument of \fBpcre_study()\fP contains option bits. At present, no
@@ -807,16 +807,16 @@
     &error);        /* set to NULL or points to a message */
 .sp
 Studying a pattern does two things: first, a lower bound for the length of
-subject string that is needed to match the pattern is computed. This does not 
-mean that there are any strings of that length that match, but it does 
-guarantee that no shorter strings match. The value is used by 
-\fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP to avoid wasting time by trying to 
-match strings that are shorter than the lower bound. You can find out the value 
+subject string that is needed to match the pattern is computed. This does not
+mean that there are any strings of that length that match, but it does
+guarantee that no shorter strings match. The value is used by
+\fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP to avoid wasting time by trying to
+match strings that are shorter than the lower bound. You can find out the value
 in a calling program via the \fBpcre_fullinfo()\fP function.
 .P
 Studying a pattern is also useful for non-anchored patterns that do not have a
 single fixed starting character. A bitmap of possible starting bytes is
-created. This speeds up finding a position in the subject at which to start 
+created. This speeds up finding a position in the subject at which to start
 matching.
 .
 .
@@ -1012,7 +1012,7 @@
 length of the longest name. PCRE_INFO_NAMETABLE returns a pointer to the first
 entry of the table (a pointer to \fBchar\fP). The first two bytes of each entry
 are the number of the capturing parenthesis, most significant byte first. The
-rest of the entry is the corresponding name, zero terminated. 
+rest of the entry is the corresponding name, zero terminated.
 .P
 The names are in alphabetical order. Duplicate names may appear if (?| is used
 to create multiple groups with the same number, as described in the
@@ -1024,10 +1024,10 @@
 .\" HREF
 \fBpcrepattern\fP
 .\"
-page. Duplicate names for subpatterns with different numbers are permitted only 
-if PCRE_DUPNAMES is set. In all cases of duplicate names, they appear in the 
-table in the order in which they were found in the pattern. In the absence of 
-(?| this is the order of increasing number; when (?| is used this is not 
+page. Duplicate names for subpatterns with different numbers are permitted only
+if PCRE_DUPNAMES is set. In all cases of duplicate names, they appear in the
+table in the order in which they were found in the pattern. In the absence of
+(?| this is the order of increasing number; when (?| is used this is not
 necessarily the case because later subpatterns may have lower numbers.
 .P
 As a simple example of the name/number table, consider the following pattern
@@ -1371,7 +1371,7 @@
 .sp
   PCRE_NOTEMPTY_ATSTART
 .sp
-This is like PCRE_NOTEMPTY, except that an empty string match that is not at 
+This is like PCRE_NOTEMPTY, except that an empty string match that is not at
 the start of the subject is permitted. If the pattern is anchored, such a match
 can occur only if the pattern contains \eK.
 .P
@@ -1427,7 +1427,7 @@
 subject, or a value of \fIstartoffset\fP that does not point to the start of a
 UTF-8 character, is undefined. Your program may crash.
 .sp
-  PCRE_PARTIAL_HARD 
+  PCRE_PARTIAL_HARD
   PCRE_PARTIAL_SOFT
 .sp
 These options turn on the partial matching feature. For backwards
@@ -1634,7 +1634,7 @@
 .sp
 This code is no longer in use. It was formerly returned when the PCRE_PARTIAL
 option was used with a compiled pattern containing items that were not
-supported for partial matching. From release 8.00 onwards, there are no 
+supported for partial matching. From release 8.00 onwards, there are no
 restrictions on partial matching.
 .sp
   PCRE_ERROR_INTERNAL       (-14)
@@ -1898,7 +1898,7 @@
 just once, and does not backtrack. This has different characteristics to the
 normal algorithm, and is not compatible with Perl. Some of the features of PCRE
 patterns are not supported. Nevertheless, there are times when this kind of
-matching can be useful. For a discussion of the two matching algorithms, and a 
+matching can be useful. For a discussion of the two matching algorithms, and a
 list of features that \fBpcre_dfa_exec()\fP does not support, see the
 .\" HREF
 \fBpcrematching\fP
@@ -1944,7 +1944,7 @@
 for \fBpcre_exec()\fP, so their description is not repeated here.
 .sp
   PCRE_PARTIAL_HARD
-  PCRE_PARTIAL_SOFT 
+  PCRE_PARTIAL_SOFT
 .sp
 These have the same general effect as they do for \fBpcre_exec()\fP, but the
 details are slightly different. When PCRE_PARTIAL_HARD is set for


Modified: code/trunk/doc/pcrebuild.3
===================================================================
--- code/trunk/doc/pcrebuild.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcrebuild.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -12,11 +12,11 @@
 \fBconfigure\fP before running the \fBmake\fP command. However, the same
 options can be selected in both Unix-like and non-Unix-like environments using
 the GUI facility of \fBcmake-gui\fP if you are using \fBCMake\fP instead of
-\fBconfigure\fP to build PCRE. 
+\fBconfigure\fP to build PCRE.
 .P
-There is a lot more information about building PCRE in non-Unix-like 
-environments in the file called \fINON_UNIX_USE\fP, which is part of the PCRE 
-distribution. You should consult this file as well as the \fIREADME\fP file if 
+There is a lot more information about building PCRE in non-Unix-like
+environments in the file called \fINON_UNIX_USE\fP, which is part of the PCRE
+distribution. You should consult this file as well as the \fIREADME\fP file if
 you are building in a non-Unix-like environment.
 .P
 The complete list of options for \fBconfigure\fP (which includes the standard


Modified: code/trunk/doc/pcrecallout.3
===================================================================
--- code/trunk/doc/pcrecallout.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcrecallout.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -19,7 +19,7 @@
 .sp
   (?C1)abc(?C2)def
 .sp
-If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP or 
+If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP or
 \fBpcre_compile2()\fP is called, PCRE automatically inserts callouts, all with
 number 255, before each item in the pattern. For example, if PCRE_AUTO_CALLOUT
 is used with the pattern


Modified: code/trunk/doc/pcrecompat.3
===================================================================
--- code/trunk/doc/pcrecompat.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcrecompat.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -77,7 +77,7 @@
 documentation for details.
 .P
 9. Subpatterns that are called recursively or as "subroutines" are always
-treated as atomic groups in PCRE. This is like Python, but unlike Perl. There 
+treated as atomic groups in PCRE. This is like Python, but unlike Perl. There
 is a discussion of an example that explains this in more detail in the
 .\" HTML <a href="pcrepattern.html#recursiondifference">
 .\" </a>
@@ -97,7 +97,7 @@
 (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in the forms without an
 argument. PCRE does not support (*MARK).
 .P
-12. PCRE's handling of duplicate subpattern numbers and duplicate subpattern 
+12. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
 names is not as general as Perl's. This is a consequence of the fact the PCRE
 works internally just with numbers, using an external table to translate
 between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B),


Modified: code/trunk/doc/pcregrep.1
===================================================================
--- code/trunk/doc/pcregrep.1    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcregrep.1    2009-10-05 10:59:35 UTC (rev 461)
@@ -89,9 +89,9 @@
 .SH OPTIONS
 .rs
 .sp
-The order in which some of the options appear can affect the output. For 
-example, both the \fB-h\fP and \fB-l\fP options affect the printing of file 
-names. Whichever comes later in the command line will be the one that takes 
+The order in which some of the options appear can affect the output. For
+example, both the \fB-h\fP and \fB-l\fP options affect the printing of file
+names. Whichever comes later in the command line will be the one that takes
 effect.
 .TP 10
 \fB--\fP
@@ -272,9 +272,9 @@
 Instead of outputting lines from the files, just output the names of the files
 containing lines that would have been output. Each file name is output
 once, on a separate line. Searching normally stops as soon as a matching line
-is found in a file. However, if the \fB-c\fP (count) option is also used, 
-matching continues in order to obtain the correct count, and those files that 
-have at least one match are listed along with their counts. Using this option 
+is found in a file. However, if the \fB-c\fP (count) option is also used,
+matching continues in order to obtain the correct count, and those files that
+have at least one match are listed along with their counts. Using this option
 with \fB-c\fP is a way of suppressing the listing of files with no matches.
 .TP
 \fB--label\fP=\fIname\fP
@@ -410,8 +410,8 @@
 as in the GNU \fBgrep\fP program. Any long option of the form
 \fB--xxx-regexp\fP (GNU terminology) is also available as \fB--xxx-regex\fP
 (PCRE terminology). However, the \fB--locale\fP, \fB-M\fP, \fB--multiline\fP,
-\fB-u\fP, and \fB--utf-8\fP options are specific to \fBpcregrep\fP. If both the 
-\fB-c\fP and \fB-l\fP options are given, GNU grep lists only file names, 
+\fB-u\fP, and \fB--utf-8\fP options are specific to \fBpcregrep\fP. If both the
+\fB-c\fP and \fB-l\fP options are given, GNU grep lists only file names,
 without counts, but \fBpcregrep\fP gives the counts.
 .
 .


Modified: code/trunk/doc/pcrematching.3
===================================================================
--- code/trunk/doc/pcrematching.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcrematching.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -74,9 +74,9 @@
 traditional finite state machine (it keeps multiple states active
 simultaneously).
 .P
-Although the general principle of this matching algorithm is that it scans the 
-subject string only once, without backtracking, there is one exception: when a 
-lookaround assertion is encountered, the characters following or preceding the 
+Although the general principle of this matching algorithm is that it scans the
+subject string only once, without backtracking, there is one exception: when a
+lookaround assertion is encountered, the characters following or preceding the
 current point have to be independently inspected.
 .P
 The scan continues until either the end of the subject is reached, or there are
@@ -152,9 +152,9 @@
 never needs to backtrack, it is possible to pass very long subject strings to
 the matching function in several pieces, checking for partial matching each
 time. The
-.\" HREF                                                                
+.\" HREF
 \fBpcrepartial\fP
-.\"                                                           
+.\"
 documentation gives details of partial matching.
 .
 .


Modified: code/trunk/doc/pcrepartial.3
===================================================================
--- code/trunk/doc/pcrepartial.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcrepartial.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -35,9 +35,9 @@
 Setting a partial matching option disables two of PCRE's optimizations. PCRE
 remembers the last literal byte in a pattern, and abandons matching immediately
 if such a byte is not present in the subject string. This optimization cannot
-be used for a subject string that might match only partially. If the pattern 
-was studied, PCRE knows the minimum length of a matching string, and does not 
-bother to run the matching function on shorter strings. This optimization is 
+be used for a subject string that might match only partially. If the pattern
+was studied, PCRE knows the minimum length of a matching string, and does not
+bother to run the matching function on shorter strings. This optimization is
 also disabled for partial matching.
 .
 .


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcrepattern.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -21,7 +21,7 @@
 description of PCRE's regular expressions is intended as reference material.
 .P
 The original operation of PCRE was on strings of one-byte characters. However,
-there is now also support for UTF-8 character strings. To use this, 
+there is now also support for UTF-8 character strings. To use this,
 PCRE must be built to include UTF-8 support, and you must call
 \fBpcre_compile()\fP or \fBpcre_compile2()\fP with the PCRE_UTF8 option. There
 is also a special sequence that can be given at the start of a pattern:
@@ -83,7 +83,7 @@
   (*ANYCRLF)   any of the three above
   (*ANY)       all Unicode newline sequences
 .sp
-These override the default and the options given to \fBpcre_compile()\fP or 
+These override the default and the options given to \fBpcre_compile()\fP or
 \fBpcre_compile2()\fP. For example, on a Unix system where LF is the default
 newline sequence, the pattern
 .sp
@@ -333,7 +333,7 @@
 later.
 .\"
 Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
-synonymous. The former is a back reference; the latter is a 
+synonymous. The former is a back reference; the latter is a
 .\" HTML <a href="#subpatternsassubroutines">
 .\" </a>
 subroutine
@@ -468,7 +468,7 @@
   (*BSR_ANYCRLF)   CR, LF, or CRLF only
   (*BSR_UNICODE)   any Unicode newline sequence
 .sp
-These override the default and the options given to \fBpcre_compile()\fP or 
+These override the default and the options given to \fBpcre_compile()\fP or
 \fBpcre_compile2()\fP, but they can be overridden by options given to
 \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. Note that these special settings,
 which are not Perl-compatible, are recognized only at the very start of a
@@ -741,9 +741,9 @@
 A word boundary is a position in the subject string where the current character
 and the previous character do not both match \ew or \eW (i.e. one matches
 \ew and the other matches \eW), or the start or end of the string if the
-first or last character matches \ew, respectively. Neither PCRE nor Perl has a 
-separte "start of word" or "end of word" metasequence. However, whatever 
-follows \eb normally determines which it is. For example, the fragment 
+first or last character matches \ew, respectively. Neither PCRE nor Perl has a
+separte "start of word" or "end of word" metasequence. However, whatever
+follows \eb normally determines which it is. For example, the fragment
 \eba matches "a" at the start of a word.
 .P
 The \eA, \eZ, and \ez assertions differ from the traditional circumflex and
@@ -876,8 +876,8 @@
 .rs
 .sp
 An opening square bracket introduces a character class, terminated by a closing
-square bracket. A closing square bracket on its own is not special by default. 
-However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square 
+square bracket. A closing square bracket on its own is not special by default.
+However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square
 bracket causes a compile-time error. If a closing square bracket is required as
 a member of the class, it should be the first data character in the class
 (after an initial circumflex, if present) or escaped with a backslash.
@@ -1163,14 +1163,14 @@
   / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
   # 1            2         2  3        2     3     4
 .sp
-A backreference to a numbered subpattern uses the most recent value that is set 
+A backreference to a numbered subpattern uses the most recent value that is set
 for that number by any subpattern. The following pattern matches "abcabc" or
 "defdef":
 .sp
-  /(?|(abc)|(def))\1/
+  /(?|(abc)|(def))\e1/
 .sp
 In contrast, a recursive or "subroutine" call to a numbered subpattern always
-refers to the first one in the pattern with the given number. The following 
+refers to the first one in the pattern with the given number. The following
 pattern matches "abcabc" or "defabc":
 .sp
   /(?|(abc)|(def))(?1)/
@@ -1225,7 +1225,7 @@
 .P
 By default, a name must be unique within a pattern, but it is possible to relax
 this constraint by setting the PCRE_DUPNAMES option at compile time. (Duplicate
-names are also always permitted for subpatterns with the same number, set up as 
+names are also always permitted for subpatterns with the same number, set up as
 described in the previous section.) Duplicate names can be useful for patterns
 where only one instance of the named parentheses can match. Suppose you want to
 match the name of a weekday, either as a 3-letter abbreviation or as the full
@@ -1244,7 +1244,7 @@
 .P
 The convenience function for extracting the data by name returns the substring
 for the first (and in this example, the only) subpattern of that name that
-matched. This saves searching to find which numbered subpattern it was. 
+matched. This saves searching to find which numbered subpattern it was.
 .P
 If you make a backreference to a non-unique named subpattern from elsewhere in
 the pattern, the one that corresponds to the first occurrence of the name is
@@ -1256,7 +1256,7 @@
 .\" </a>
 section about conditions
 .\"
-below), either to check whether a subpattern has matched, or to check for 
+below), either to check whether a subpattern has matched, or to check for
 recursion, all subpatterns with the same name are tested. If the condition is
 true for any one of them, the overall condition is true. This is the same
 behaviour as testing by number. For further details of the interfaces for
@@ -1288,7 +1288,7 @@
   a character class
   a back reference (see next section)
   a parenthesized subpattern (unless it is an assertion)
-  a recursive or "subroutine" call to a subpattern 
+  a recursive or "subroutine" call to a subpattern
 .sp
 The general repetition quantifier specifies a minimum and maximum number of
 permitted matches, by giving the two numbers in curly brackets (braces),
@@ -1614,8 +1614,8 @@
 .sp
   (a|(bc))\e2
 .sp
-always fails if it starts to match "a" rather than "bc". However, if the 
-PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back reference to an 
+always fails if it starts to match "a" rather than "bc". However, if the
+PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back reference to an
 unset value matches an empty string.
 .P
 Because there may be many capturing parentheses in a pattern, all digits
@@ -1737,7 +1737,7 @@
 .\" </a>
 (see above)
 .\"
-can be used instead of a lookbehind assertion to get round the fixed-length 
+can be used instead of a lookbehind assertion to get round the fixed-length
 restriction.
 .P
 The implementation of lookbehind assertions is, for each alternative, to
@@ -1755,7 +1755,7 @@
 "Subroutine"
 .\"
 calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
-as the subpattern matches a fixed-length string. 
+as the subpattern matches a fixed-length string.
 .\" HTML <a href="#recursion">
 .\" </a>
 Recursion,
@@ -1828,7 +1828,7 @@
 .sp
 It is possible to cause the matching process to obey a subpattern
 conditionally or to choose between two alternative subpatterns, depending on
-the result of an assertion, or whether a specific capturing subpattern has 
+the result of an assertion, or whether a specific capturing subpattern has
 already been matched. The two possible forms of conditional subpattern are:
 .sp
   (?(condition)yes-pattern)
@@ -1846,8 +1846,8 @@
 .sp
 If the text between the parentheses consists of a sequence of digits, the
 condition is true if a capturing subpattern of that number has previously
-matched. If there is more than one capturing subpattern with the same number 
-(see the earlier 
+matched. If there is more than one capturing subpattern with the same number
+(see the earlier
 .\"
 .\" HTML <a href="#recursion">
 .\" </a>
@@ -1899,8 +1899,8 @@
 .sp
   (?<OPEN> \e( )?    [^()]+    (?(<OPEN>) \e) )
 .sp
-If the name used in a condition of this kind is a duplicate, the test is 
-applied to all subpatterns of the same name, and is true if any one of them has 
+If the name used in a condition of this kind is a duplicate, the test is
+applied to all subpatterns of the same name, and is true if any one of them has
 matched.
 .
 .SS "Checking for pattern recursion"
@@ -1915,11 +1915,11 @@
 .sp
 the condition is true if the most recent recursion is into a subpattern whose
 number or name is given. This condition does not check the entire recursion
-stack. If the name used in a condition of this kind is a duplicate, the test is 
-applied to all subpatterns of the same name, and is true if any one of them is 
-the most recent recursion. 
+stack. If the name used in a condition of this kind is a duplicate, the test is
+applied to all subpatterns of the same name, and is true if any one of them is
+the most recent recursion.
 .P
-At "top level", all these recursion test conditions are false. 
+At "top level", all these recursion test conditions are false.
 .\" HTML <a href="#recursion">
 .\" </a>
 The syntax for recursive patterns
@@ -1933,7 +1933,7 @@
 name DEFINE, the condition is always false. In this case, there may be only one
 alternative in the subpattern. It is always skipped if control reaches this
 point in the pattern; the idea of DEFINE is that it can be used to define
-"subroutines" that can be referenced from elsewhere. (The use of 
+"subroutines" that can be referenced from elsewhere. (The use of
 .\" HTML <a href="#subpatternsassubroutines">
 .\" </a>
 "subroutines"
@@ -2010,7 +2010,7 @@
 .P
 A special item that consists of (? followed by a number greater than zero and a
 closing parenthesis is a recursive call of the subpattern of the given number,
-provided that it occurs inside that subpattern. (If not, it is a 
+provided that it occurs inside that subpattern. (If not, it is a
 .\" HTML <a href="#subpatternsassubroutines">
 .\" </a>
 "subroutine"
@@ -2026,7 +2026,7 @@
 First it matches an opening parenthesis. Then it matches any number of
 substrings which can either be a sequence of non-parentheses, or a recursive
 match of the pattern itself (that is, a correctly parenthesized substring).
-Finally there is a closing parenthesis. Note the use of a possessive quantifier 
+Finally there is a closing parenthesis. Note the use of a possessive quantifier
 to avoid backtracking into sequences of non-parentheses.
 .P
 If this were part of a larger pattern, you would not want to recurse the entire
@@ -2117,25 +2117,25 @@
 In PCRE (like Python, but unlike Perl), a recursive subpattern call is always
 treated as an atomic group. That is, once it has matched some of the subject
 string, it is never re-entered, even if it contains untried alternatives and
-there is a subsequent matching failure. This can be illustrated by the 
-following pattern, which purports to match a palindromic string that contains 
+there is a subsequent matching failure. This can be illustrated by the
+following pattern, which purports to match a palindromic string that contains
 an odd number of characters (for example, "a", "aba", "abcba", "abcdcba"):
 .sp
   ^(.|(.)(?1)\e2)$
 .sp
-The idea is that it either matches a single character, or two identical 
-characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE 
+The idea is that it either matches a single character, or two identical
+characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE
 it does not if the pattern is longer than three characters. Consider the
 subject string "abcba":
 .P
-At the top level, the first character is matched, but as it is not at the end 
+At the top level, the first character is matched, but as it is not at the end
 of the string, the first alternative fails; the second alternative is taken
 and the recursion kicks in. The recursive call to subpattern 1 successfully
 matches the next character ("b"). (Note that the beginning and end of line
 tests are not part of the recursion).
 .P
 Back at the top level, the next character ("c") is compared with what
-subpattern 2 matched, which was "a". This fails. Because the recursion is 
+subpattern 2 matched, which was "a". This fails. Because the recursion is
 treated as an atomic group, there are now no backtracking points, and so the
 entire match fails. (Perl is able, at this point, to re-enter the recursion and
 try the second alternative.) However, if the pattern is written with the
@@ -2143,32 +2143,32 @@
 .sp
   ^((.)(?1)\e2|.)$
 .sp
-This time, the recursing alternative is tried first, and continues to recurse 
-until it runs out of characters, at which point the recursion fails. But this 
-time we do have another alternative to try at the higher level. That is the big 
+This time, the recursing alternative is tried first, and continues to recurse
+until it runs out of characters, at which point the recursion fails. But this
+time we do have another alternative to try at the higher level. That is the big
 difference: in the previous case the remaining alternative is at a deeper
 recursion level, which PCRE cannot use.
 .P
-To change the pattern so that matches all palindromic strings, not just those 
+To change the pattern so that matches all palindromic strings, not just those
 with an odd number of characters, it is tempting to change the pattern to this:
 .sp
   ^((.)(?1)\e2|.?)$
 .sp
-Again, this works in Perl, but not in PCRE, and for the same reason. When a 
-deeper recursion has matched a single character, it cannot be entered again in 
-order to match an empty string. The solution is to separate the two cases, and 
+Again, this works in Perl, but not in PCRE, and for the same reason. When a
+deeper recursion has matched a single character, it cannot be entered again in
+order to match an empty string. The solution is to separate the two cases, and
 write out the odd and even cases as alternatives at the higher level:
 .sp
   ^(?:((.)(?1)\e2|)|((.)(?3)\e4|.))
-.sp   
-If you want to match typical palindromic phrases, the pattern has to ignore all 
+.sp
+If you want to match typical palindromic phrases, the pattern has to ignore all
 non-word characters, which can be done like this:
 .sp
-  ^\eW*+(?:((.)\eW*+(?1)\eW*+\e2|)|((.)\eW*+(?3)\eW*+\4|\eW*+.\eW*+))\eW*+$
+  ^\eW*+(?:((.)\eW*+(?1)\eW*+\e2|)|((.)\eW*+(?3)\eW*+\e4|\eW*+.\eW*+))\eW*+$
 .sp
-If run with the PCRE_CASELESS option, this pattern matches phrases such as "A 
-man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note 
-the use of the possessive quantifier *+ to avoid backtracking into sequences of 
+If run with the PCRE_CASELESS option, this pattern matches phrases such as "A
+man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note
+the use of the possessive quantifier *+ to avoid backtracking into sequences of
 non-word characters. Without this, PCRE takes a great deal longer (ten times or
 more) to match typical phrases, and Perl takes so long that you think it has
 gone into a loop.
@@ -2294,9 +2294,9 @@
 failing negative assertion, they cause an error if encountered by
 \fBpcre_dfa_exec()\fP.
 .P
-If any of these verbs are used in an assertion subpattern, their effect is 
+If any of these verbs are used in an assertion subpattern, their effect is
 confined to that subpattern; it does not extend to the surrounding pattern.
-Note that assertion subpatterns are processed as anchored at the point where 
+Note that assertion subpatterns are processed as anchored at the point where
 they are tested.
 .P
 The new verbs make use of what was previously invalid syntax: an opening
@@ -2319,7 +2319,7 @@
 .sp
   A((?:A|B(*ACCEPT)|C)D)
 .sp
-This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by 
+This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
 the outer parentheses.
 .sp
   (*FAIL) or (*F)
@@ -2400,7 +2400,7 @@
 .SH "SEE ALSO"
 .rs
 .sp
-\fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), 
+\fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3),
 \fBpcresyntax\fP(3), \fBpcre\fP(3).
 .
 .


Modified: code/trunk/doc/pcreposix.3
===================================================================
--- code/trunk/doc/pcreposix.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcreposix.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -103,9 +103,9 @@
 .sp
   REG_UNGREEDY
 .sp
-The PCRE_UNGREEDY option is set when the regular expression is passed for 
+The PCRE_UNGREEDY option is set when the regular expression is passed for
 compilation to the native function. Note that REG_UNGREEDY is not part of the
-POSIX standard.   
+POSIX standard.
 .sp
   REG_UTF8
 .sp


Modified: code/trunk/doc/pcresample.3
===================================================================
--- code/trunk/doc/pcresample.3    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcresample.3    2009-10-05 10:59:35 UTC (rev 461)
@@ -10,7 +10,7 @@
 .\" HREF
 \fBpcredemo\fP
 .\"
-documentation. If you do not have a copy of the PCRE distribution, you can save 
+documentation. If you do not have a copy of the PCRE distribution, you can save
 this listing to re-create \fIpcredemo.c\fP.
 .P
 The program compiles the regular expression that is its first argument, and
@@ -50,7 +50,7 @@
 \fBpcretest\fP,
 .\"
 which supports many more facilities for testing regular expressions and the
-PCRE library. The 
+PCRE library. The
 .\" HREF
 \fBpcredemo\fP
 .\"


Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcretest.1    2009-10-05 10:59:35 UTC (rev 461)
@@ -213,7 +213,7 @@
 If any call to \fBpcre_exec()\fP in a \fB/g\fP or \fB/G\fP sequence matches an
 empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and
 PCRE_ANCHORED flags set in order to search for another, non-empty, match at the
-same point. If this second match fails, the start offset is advanced by one 
+same point. If this second match fails, the start offset is advanced by one
 character, and the normal match is retried. This imitates the way Perl handles
 such cases when using the \fB/g\fP modifier or the \fBsplit()\fP function.
 .
@@ -357,14 +357,14 @@
 .\" JOIN
   \eN         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP
                or \fBpcre_dfa_exec()\fP; if used twice, pass the
-               PCRE_NOTEMPTY_ATSTART option 
+               PCRE_NOTEMPTY_ATSTART option
 .\" JOIN
   \eOdd       set the size of the output vector passed to
                \fBpcre_exec()\fP to dd (any number of digits)
 .\" JOIN
   \eP         pass the PCRE_PARTIAL_SOFT option to \fBpcre_exec()\fP
                or \fBpcre_dfa_exec()\fP; if used twice, pass the
-               PCRE_PARTIAL_HARD option 
+               PCRE_PARTIAL_HARD option
 .\" JOIN
   \eQdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd
                (any number of digits)
@@ -551,7 +551,7 @@
 .sp
 (Using the normal matching function on this data finds only "tang".) The
 longest matching string is always given first (and numbered zero). After a
-PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the 
+PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the
 partially matching substring.
 .P
 If \fB/g\fP is present on the pattern, the search for further matches resumes


Modified: code/trunk/doc/pcretest.txt
===================================================================
--- code/trunk/doc/pcretest.txt    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/pcretest.txt    2009-10-05 10:59:35 UTC (rev 461)
@@ -335,6 +335,8 @@
                       (any number of digits)
          \R         pass the PCRE_DFA_RESTART option to pcre_dfa_exec()
          \S         output details of memory get/free calls during matching
+         \Y         pass the PCRE_NO_START_OPTIMIZE option to pcre_exec()
+                      or pcre_dfa_exec()
          \Z         pass the PCRE_NOTEOL option to pcre_exec()
                       or pcre_dfa_exec()
          \?         pass the PCRE_NO_UTF8_CHECK option to
@@ -661,5 +663,5 @@


REVISION

-       Last updated: 11 September 2009
+       Last updated: 26 September 2009
        Copyright (c) 1997-2009 University of Cambridge.


Modified: code/trunk/doc/perltest.txt
===================================================================
--- code/trunk/doc/perltest.txt    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/doc/perltest.txt    2009-10-05 10:59:35 UTC (rev 461)
@@ -17,10 +17,10 @@
 The perltest.pl script can also test UTF-8 features. It recognizes the special
 modifier /8 that pcretest uses to invoke UTF-8 functionality. The testinput4
 and testinput6 files can be fed to perltest to run compatible UTF-8 tests.
-However, it is necessary to add "use utf8;" to the script to make this work 
-correctly. 
+However, it is necessary to add "use utf8;" to the script to make this work
+correctly.


-The testinput11 file contains tests that use features of Perl 5.10, so does not
+The testinput11 file contains tests that use features of Perl 5.10, so does not
work with Perl 5.8.

The other testinput files are not suitable for feeding to perltest.pl, since

Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcre_compile.c    2009-10-05 10:59:35 UTC (rev 461)
@@ -343,7 +343,7 @@
   "digit expected after (?+\0"
   "] is an invalid data character in JavaScript compatibility mode\0"
   /* 65 */
-  "different names for subpatterns of the same number are not allowed"; 
+  "different names for subpatterns of the same number are not allowed";



 /* Table to identify digits and hex digits. This is used when compiling
@@ -1102,7 +1102,7 @@
       if (name != NULL && lorn == ptr - thisname &&
           strncmp((const char *)name, (const char *)thisname, lorn) == 0)
         return *count;
-      term++;   
+      term++;
       }
     }
   }
@@ -1148,10 +1148,10 @@
           break;
         }
       else if (!negate_class && ptr[1] == CHAR_CIRCUMFLEX_ACCENT)
-        { 
+        {
         negate_class = TRUE;
         ptr++;
-        }  
+        }
       else break;
       }


@@ -1340,22 +1340,22 @@

/* Scan a branch and compute the fixed length of subject that will match it,
if the length is fixed. This is needed for dealing with backward assertions.
-In UTF8 mode, the result is in characters rather than bytes. The branch is
+In UTF8 mode, the result is in characters rather than bytes. The branch is
temporarily terminated with OP_END when this function is called.

-This function is called when a backward assertion is encountered, so that if it
-fails, the error message can point to the correct place in the pattern.
+This function is called when a backward assertion is encountered, so that if it
+fails, the error message can point to the correct place in the pattern.
However, we cannot do this when the assertion contains subroutine calls,
-because they can be forward references. We solve this by remembering this case
+because they can be forward references. We solve this by remembering this case
and doing the check at the end; a flag specifies which mode we are running in.

 Arguments:
   code     points to the start of the pattern (the bracket)
   options  the compiling options
-  atend    TRUE if called when the pattern is complete 
-  cd       the "compile data" structure 
+  atend    TRUE if called when the pattern is complete
+  cd       the "compile data" structure


-Returns:   the fixed length, 
+Returns:   the fixed length,
              or -1 if there is no fixed length,
              or -2 if \C was encountered
              or -3 if an OP_RECURSE item was encountered and atend is FALSE
@@ -1405,21 +1405,21 @@
     cc += 1 + LINK_SIZE;
     branchlength = 0;
     break;
-    
+
     /* A true recursion implies not fixed length, but a subroutine call may
     be OK. If the subroutine is a forward reference, we can't deal with
     it until the end of the pattern, so return -3. */
-    
+
     case OP_RECURSE:
     if (!atend) return -3;
     cs = ce = (uschar *)cd->start_code + GET(cc, 1);  /* Start subpattern */
     do ce += GET(ce, 1); while (*ce == OP_ALT);       /* End subpattern */
     if (cc > cs && cc < ce) return -1;                /* Recursion */
     d = find_fixedlength(cs + 2, options, atend, cd);
-    if (d < 0) return d; 
+    if (d < 0) return d;
     branchlength += d;
     cc += 1 + LINK_SIZE;
-    break;   
+    break;


     /* Skip over assertive subpatterns */


@@ -1459,7 +1459,7 @@
     branchlength++;
     cc += 2;
 #ifdef SUPPORT_UTF8
-    if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0) 
+    if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0)
       cc += _pcre_utf8_table4[cc[-1] & 0x3f];
 #endif
     break;
@@ -1471,7 +1471,7 @@
     branchlength += GET2(cc,1);
     cc += 4;
 #ifdef SUPPORT_UTF8
-    if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0) 
+    if ((options & PCRE_UTF8) != 0 && cc[-1] >= 0xc0)
       cc += _pcre_utf8_table4[cc[-1] & 0x3f];
 #endif
     break;
@@ -1556,8 +1556,8 @@


/* This little function scans through a compiled pattern until it finds a
capturing bracket with the given number, or, if the number is negative, an
-instance of OP_REVERSE for a lookbehind. The function is global in the C sense
-so that it can be called from pcre_study() when finding the minimum matching
+instance of OP_REVERSE for a lookbehind. The function is global in the C sense
+so that it can be called from pcre_study() when finding the minimum matching
length.

Arguments:
@@ -1581,12 +1581,12 @@
the table is zero; the actual length is stored in the compiled code. */

   if (c == OP_XCLASS) code += GET(code, 1);
-  
+
   /* Handle recursion */
-  
+
   else if (c == OP_REVERSE)
     {
-    if (number < 0) return (uschar *)code; 
+    if (number < 0) return (uschar *)code;
     code += _pcre_OP_lengths[c];
     }


@@ -1957,7 +1957,7 @@
     case OP_POSQUERY:
     if (utf8 && code[1] >= 0xc0) code += _pcre_utf8_table4[code[1] & 0x3f];
     break;
- 
+
     case OP_UPTO:
     case OP_MINUPTO:
     case OP_POSUPTO:
@@ -3915,15 +3915,15 @@


       if (repeat_max == 0) goto END_REPEAT;


-      /*--------------------------------------------------------------------*/ 
+      /*--------------------------------------------------------------------*/
       /* This code is obsolete from release 8.00; the restriction was finally
       removed: */
-       
+
       /* All real repeats make it impossible to handle partial matching (maybe
       one day we will be able to remove this restriction). */
-       
+
       /* if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL; */
-      /*--------------------------------------------------------------------*/ 
+      /*--------------------------------------------------------------------*/


       /* Combine the op_type with the repeat_type */


@@ -4070,7 +4070,7 @@
         goto END_REPEAT;
         }


-      /*--------------------------------------------------------------------*/ 
+      /*--------------------------------------------------------------------*/
       /* This code is obsolete from release 8.00; the restriction was finally
       removed: */


@@ -4078,7 +4078,7 @@
       one day we will be able to remove this restriction). */


       /* if (repeat_max != 1) cd->external_flags |= PCRE_NOPARTIAL; */
-      /*--------------------------------------------------------------------*/ 
+      /*--------------------------------------------------------------------*/


       if (repeat_min == 0 && repeat_max == -1)
         *code++ = OP_CRSTAR + repeat_type;
@@ -4393,11 +4393,11 @@
     if (possessive_quantifier)
       {
       int len;
-       
+
       if (*tempcode == OP_TYPEEXACT)
         tempcode += _pcre_OP_lengths[*tempcode] +
-          ((tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP)? 2 : 0);   
-           
+          ((tempcode[3] == OP_PROP || tempcode[3] == OP_NOTPROP)? 2 : 0);
+
       else if (*tempcode == OP_EXACT || *tempcode == OP_NOTEXACT)
         {
         tempcode += _pcre_OP_lengths[*tempcode];
@@ -4405,8 +4405,8 @@
         if (utf8 && tempcode[-1] >= 0xc0)
           tempcode += _pcre_utf8_table4[tempcode[-1] & 0x3f];
 #endif
-        }         
- 
+        }
+
       len = code - tempcode;
       if (len > 0) switch (*tempcode)
         {
@@ -4485,17 +4485,17 @@
             strncmp((char *)name, vn, namelen) == 0)
           {
           /* Check for open captures before ACCEPT */
-            
+
           if (verbs[i].op == OP_ACCEPT)
             {
-            open_capitem *oc; 
-            cd->had_accept = TRUE; 
+            open_capitem *oc;
+            cd->had_accept = TRUE;
             for (oc = cd->open_caps; oc != NULL; oc = oc->next)
               {
               *code++ = OP_CLOSE;
-              PUT2INC(code, 0, oc->number); 
-              }  
-            }  
+              PUT2INC(code, 0, oc->number);
+              }
+            }
           *code++ = verbs[i].op;
           break;
           }
@@ -4658,9 +4658,9 @@
           }


         /* Otherwise (did not start with "+" or "-"), start by looking for the
-        name. If we find a name, add one to the opcode to change OP_CREF or 
-        OP_RREF into OP_NCREF or OP_NRREF. These behave exactly the same, 
-        except they record that the reference was originally to a name. The 
+        name. If we find a name, add one to the opcode to change OP_CREF or
+        OP_RREF into OP_NCREF or OP_NRREF. These behave exactly the same,
+        except they record that the reference was originally to a name. The
         information is used to check duplicate names. */


         slot = cd->name_table;
@@ -4887,7 +4887,7 @@
           is because the number of names, and hence the table size, is computed
           in the pre-compile, and it affects various numbers and pointers which
           would all have to be modified, and the compiled code moved down, if
-          duplicates with the same number were omitted from the table. This 
+          duplicates with the same number were omitted from the table. This
           doesn't seem worth the hassle. However, *different* names for the
           same number are not permitted. */


@@ -4895,7 +4895,7 @@
             {
             BOOL dupname = FALSE;
             slot = cd->name_table;
- 
+
             for (i = 0; i < cd->names_found; i++)
               {
               int crc = memcmp(name, slot+2, namelen);
@@ -4909,31 +4909,31 @@
                     *errorcodeptr = ERR43;
                     goto FAILED;
                     }
-                  else dupname = TRUE;   
+                  else dupname = TRUE;
                   }
                 else crc = -1;      /* Current name is a substring */
                 }
-                
-              /* Make space in the table and break the loop for an earlier 
-              name. For a duplicate or later name, carry on. We do this for 
-              duplicates so that in the simple case (when ?(| is not used) they 
+
+              /* Make space in the table and break the loop for an earlier
+              name. For a duplicate or later name, carry on. We do this for
+              duplicates so that in the simple case (when ?(| is not used) they
               are in order of their numbers. */
-                
+
               if (crc < 0)
                 {
                 memmove(slot + cd->name_entry_size, slot,
                   (cd->names_found - i) * cd->name_entry_size);
                 break;
                 }
-                
+
               /* Continue the loop for a later or duplicate name */
-               
+
               slot += cd->name_entry_size;
               }
-              
+
             /* For non-duplicate names, check for a duplicate number before
             adding the new name. */
-              
+
             if (!dupname)
               {
               uschar *cslot = cd->name_table;
@@ -4945,12 +4945,12 @@
                     {
                     *errorcodeptr = ERR65;
                     goto FAILED;
-                    }    
+                    }
                   }
-                else i--;   
+                else i--;
                 cslot += cd->name_entry_size;
-                }  
-              }   
+                }
+              }


             PUT2(slot, 0, cd->bracount + 1);
             memcpy(slot + 2, name, namelen);
@@ -5131,7 +5131,7 @@
           if (lengthptr == NULL)
             {
             *code = OP_END;
-            if (recno != 0) 
+            if (recno != 0)
               called = _pcre_find_bracket(cd->start_code, utf8, recno);


             /* Forward reference */
@@ -5812,8 +5812,8 @@
   capnumber = GET2(code, 1 + LINK_SIZE);
   capitem.number = capnumber;
   capitem.next = cd->open_caps;
-  cd->open_caps = &capitem;  
-  } 
+  cd->open_caps = &capitem;
+  }


/* Offset is set zero to mark that this bracket is still open */

@@ -5909,10 +5909,10 @@

     /* If lookbehind, check that this branch matches a fixed-length string, and
     put the length into the OP_REVERSE item. Temporarily mark the end of the
-    branch with OP_END. If the branch contains OP_RECURSE, the result is -3 
+    branch with OP_END. If the branch contains OP_RECURSE, the result is -3
     because there may be forward references that we can't check here. Set a
-    flag to cause another lookbehind check at the end. Why not do it all at the 
-    end? Because common, erroneous checks are picked up here and the offset of 
+    flag to cause another lookbehind check at the end. Why not do it all at the
+    end? Because common, erroneous checks are picked up here and the offset of
     the problem can be shown. */


     if (lookbehind)
@@ -5923,8 +5923,8 @@
       DPRINTF(("fixed length = %d\n", fixed_length));
       if (fixed_length == -3)
         {
-        cd->check_lookbehind = TRUE; 
-        }   
+        cd->check_lookbehind = TRUE;
+        }
       else if (fixed_length < 0)
         {
         *errorcodeptr = (fixed_length == -2)? ERR36 : ERR25;
@@ -5958,9 +5958,9 @@
         }
       while (branch_length > 0);
       }
-      
+
     /* If it was a capturing subpattern, remove it from the chain. */
-    
+
     if (capnumber > 0) cd->open_caps = cd->open_caps->next;


     /* Fill in the ket */
@@ -6654,7 +6654,7 @@


if (errorcode == 0 && re->top_backref > re->top_bracket) errorcode = ERR15;

-/* If there were any lookbehind assertions that contained OP_RECURSE 
+/* If there were any lookbehind assertions that contained OP_RECURSE
 (recursions or subroutine calls), a flag is set for them to be checked here,
 because they may contain forward references. Actual recursions can't be fixed
 length, but subroutine calls can. It is done like this so that those without
@@ -6665,21 +6665,21 @@
 if (cd->check_lookbehind)
   {
   uschar *cc = (uschar *)codestart;
-   
-  /* Loop, searching for OP_REVERSE items, and process those that do not have 
-  their length set. (Actually, it will also re-process any that have a length 
-  of zero, but that is a pathological case, and it does no harm.) When we find 
+
+  /* Loop, searching for OP_REVERSE items, and process those that do not have
+  their length set. (Actually, it will also re-process any that have a length
+  of zero, but that is a pathological case, and it does no harm.) When we find
   one, we temporarily terminate the branch it is in while we scan it. */
-   
+
   for (cc = (uschar *)_pcre_find_bracket(codestart, utf8, -1);
        cc != NULL;
        cc = (uschar *)_pcre_find_bracket(cc, utf8, -1))
-    { 
+    {
     if (GET(cc, 1) == 0)
-      { 
-      int fixed_length; 
+      {
+      int fixed_length;
       uschar *be = cc - 1 - LINK_SIZE + GET(cc, -LINK_SIZE);
-      int end_op = *be; 
+      int end_op = *be;
       *be = OP_END;
       fixed_length = find_fixedlength(cc, re->options, TRUE, cd);
       *be = end_op;
@@ -6687,13 +6687,13 @@
       if (fixed_length < 0)
         {
         errorcode = (fixed_length == -2)? ERR36 : ERR25;
-        break;   
+        break;
         }
       PUT(cc, 1, fixed_length);
       }
     cc += 1 + LINK_SIZE;
-    }  
-  } 
+    }
+  }


/* Failed to compile, or error while post-processing */


Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcre_dfa_exec.c    2009-10-05 10:59:35 UTC (rev 461)
@@ -45,9 +45,9 @@
 applications. */



-/* NOTE ABOUT PERFORMANCE: A user of this function sent some code that improved
-the performance of his patterns greatly. I could not use it as it stood, as it
-was not thread safe, and made assumptions about pattern sizes. Also, it caused
+/* NOTE ABOUT PERFORMANCE: A user of this function sent some code that improved
+the performance of his patterns greatly. I could not use it as it stood, as it
+was not thread safe, and made assumptions about pattern sizes. Also, it caused
test 7 to loop, and test 9 to crash with a segfault.

The issue is the check for duplicate states, which is done by a simple linear
@@ -68,7 +68,7 @@
of internal_dfa_exec(). (The supplied patch used a static vector, initialized
only once - I suspect this was the cause of the problems with the tests.)

-Overall, I concluded that the gains in some cases did not outweigh the losses
+Overall, I concluded that the gains in some cases did not outweigh the losses
in others, so I abandoned this code. */


@@ -417,12 +417,12 @@
       current_subject - start_subject : max_back;
     current_subject -= gone_back;
     }
-    
+
   /* Save the earliest consulted character */
-  
-  if (current_subject < md->start_used_ptr) 
-    md->start_used_ptr = current_subject; 


+  if (current_subject < md->start_used_ptr)
+    md->start_used_ptr = current_subject;
+
   /* Now we can process the individual branches. */


end_code = this_start_code;
@@ -488,7 +488,7 @@
int clen, dlen;
unsigned int c, d;
int forced_fail = 0;
- int reached_end = 0;
+ int reached_end = 0;

   /* Make the new state list into the active state list and empty the
   new state list. */
@@ -578,7 +578,7 @@
         }
       }


-    /* Check for a duplicate state with the same count, and skip if found. 
+    /* Check for a duplicate state with the same count, and skip if found.
     See the note at the head of this module about the possibility of improving
     performance here. */


@@ -647,7 +647,7 @@
 /* ========================================================================== */
       /* Reached a closing bracket. If not at the end of the pattern, carry
       on with the next opcode. Otherwise, unless we have an empty string and
-      PCRE_NOTEMPTY is set, or PCRE_NOTEMPTY_ATSTART is set and we are at the 
+      PCRE_NOTEMPTY is set, or PCRE_NOTEMPTY_ATSTART is set and we are at the
       start of the subject, save the match data, shifting up all previous
       matches so we always have the longest first. */


@@ -662,10 +662,10 @@
           ADD_ACTIVE(state_offset - GET(code, 1), 0);
           }
         }
-      else 
+      else
         {
-        reached_end++;    /* Count branches that reach the end */ 
-        if (ptr > current_subject || 
+        reached_end++;    /* Count branches that reach the end */
+        if (ptr > current_subject ||
             ((md->moptions & PCRE_NOTEMPTY) == 0 &&
               ((md->moptions & PCRE_NOTEMPTY_ATSTART) == 0 ||
                 current_subject > start_subject + md->start_offset)))
@@ -689,7 +689,7 @@
               match_count, rlevel*2-2, SP));
             return match_count;
             }
-          }   
+          }
         }
       break;


@@ -839,7 +839,7 @@
         if (ptr > start_subject)
           {
           const uschar *temp = ptr - 1;
-          if (temp < md->start_used_ptr) md->start_used_ptr = temp; 
+          if (temp < md->start_used_ptr) md->start_used_ptr = temp;
 #ifdef SUPPORT_UTF8
           if (utf8) BACKCHAR(temp);
 #endif
@@ -848,13 +848,13 @@
           }
         else left_word = 0;


-        if (clen > 0) 
+        if (clen > 0)
           right_word = c < 256 && (ctypes[c] & ctype_word) != 0;
         else              /* This is a fudge to ensure that if this is the */
           {               /* last item in the pattern, we don't count it as */
           reached_end--;  /* reached, thus disabling a partial match. */
           right_word = 0;
-          } 
+          }


         if ((left_word == right_word) == (codevalue == OP_NOT_WORD_BOUNDARY))
           { ADD_ACTIVE(state_offset + 1, 0); }
@@ -2287,7 +2287,7 @@


         /* Back reference conditions are not supported */


-        if (condcode == OP_CREF || condcode == OP_NCREF) 
+        if (condcode == OP_CREF || condcode == OP_NCREF)
           return PCRE_ERROR_DFA_UCOND;


         /* The DEFINE condition is always false */
@@ -2531,7 +2531,7 @@
   if (new_count <= 0)
     {
     if (rlevel == 1 &&                               /* Top level, and */
-        reached_end != workspace[1] &&               /* Not all reached end */ 
+        reached_end != workspace[1] &&               /* Not all reached end */
         forced_fail != workspace[1] &&               /* Not all forced fail & */
         (                                            /* either... */
         (md->moptions & PCRE_PARTIAL_HARD) != 0      /* Hard partial */
@@ -2652,7 +2652,7 @@
   if ((flags & PCRE_EXTRA_TABLES) != 0)
     md->tables = extra_data->tables;
   }
-  
+
 /* Check that the first field in the block is the magic number. If it is not,
 test for a regex that was compiled on a host of opposite endianness. If this is
 the case, flipped values are put in internal_re and internal_study if there was
@@ -2914,13 +2914,13 @@


     end_subject = save_end_subject;


-    /* The following two optimizations are disabled for partial matching or if 
-    disabling is explicitly requested (and of course, by the test above, this 
+    /* The following two optimizations are disabled for partial matching or if
+    disabling is explicitly requested (and of course, by the test above, this
     code is not obeyed when restarting after a partial match). */
-    
+
     if ((options & PCRE_NO_START_OPTIMIZE) == 0 &&
         (options & (PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT)) == 0)
-      {   
+      {
       /* If the pattern was studied, a minimum subject length may be set. This
       is a lower bound; no actual string of that length may actually match the
       pattern. Although the value is, strictly, in characters, we treat it as
@@ -2929,7 +2929,7 @@
       if (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0 &&
           end_subject - current_subject < study->minlength)
         return PCRE_ERROR_NOMATCH;
-    
+
       /* If req_byte is set, we know that that character must appear in the
       subject for the match to succeed. If the first character is set, req_byte
       must be later in the subject; otherwise the test starts at the match
@@ -2937,19 +2937,19 @@
       nested unlimited repeats that aren't going to match. Writing separate
       code for cased/caseless versions makes it go faster, as does using an
       autoincrement and backing off on a match.
-      
+
       HOWEVER: when the subject string is very, very long, searching to its end
       can take a long time, and give bad performance on quite ordinary
       patterns. This showed up when somebody was matching /^C/ on a 32-megabyte
       string... so we don't do this when the string is sufficiently long. */
-      
+
       if (req_byte >= 0 && end_subject - current_subject < REQ_BYTE_MAX)
         {
         register const uschar *p = current_subject + ((first_byte >= 0)? 1 : 0);
-      
+
         /* We don't need to repeat the search if we haven't yet reached the
         place we found it at last time. */
-      
+
         if (p > req_byte_ptr)
           {
           if (req_byte_caseless)
@@ -2967,26 +2967,26 @@
               if (*p++ == req_byte) { p--; break; }
               }
             }
-      
+
           /* If we can't find the required character, break the matching loop,
           which will cause a return or PCRE_ERROR_NOMATCH. */
-      
+
           if (p >= end_subject) break;
-      
+
           /* If we have found the required character, save the point where we
           found it, so that we don't search again next time round the loop if
           the start hasn't passed this character yet. */
-      
+
           req_byte_ptr = p;
           }
-        }   
+        }
       }
     }   /* End of optimizations that are done when not restarting */


/* OK, now we can do the business */

   md->start_used_ptr = current_subject;
-   
+
   rc = internal_dfa_exec(
     md,                                /* fixed match data */
     md->start_code,                    /* this subexpression's code */


Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcre_exec.c    2009-10-05 10:59:35 UTC (rev 461)
@@ -843,34 +843,34 @@
       {
       if (md->recursive == NULL)                /* Not recursing => FALSE */
         {
-        condition = FALSE;  
-        ecode += GET(ecode, 1);                         
-        } 
+        condition = FALSE;
+        ecode += GET(ecode, 1);
+        }
       else
-        {    
+        {
         int recno = GET2(ecode, LINK_SIZE + 2);   /* Recursion group number*/
         condition =  (recno == RREF_ANY || recno == md->recursive->group_num);
-          
+
         /* If the test is for recursion into a specific subpattern, and it is
         false, but the test was set up by name, scan the table to see if the
         name refers to any other numbers, and test them. The condition is true
         if any one is set. */
-         
+
         if (!condition && condcode == OP_NRREF && recno != RREF_ANY)
           {
           uschar *slotA = md->name_table;
           for (i = 0; i < md->name_count; i++)
-            { 
-            if (GET2(slotA, 0) == recno) break; 
+            {
+            if (GET2(slotA, 0) == recno) break;
             slotA += md->name_entry_size;
             }
-             
+
           /* Found a name for the number - there can be only one; duplicate
           names for different numbers are allowed, but not vice versa. First
           scan down for duplicates. */
-            
+
           if (i < md->name_count)
-            {    
+            {
             uschar *slotB = slotA;
             while (slotB > md->name_table)
               {
@@ -878,15 +878,15 @@
               if (strcmp((char *)slotA + 2, (char *)slotB + 2) == 0)
                 {
                 condition = GET2(slotB, 0) == md->recursive->group_num;
-                if (condition) break;   
-                }    
+                if (condition) break;
+                }
               else break;
-              } 
-        
+              }
+
             /* Scan up for duplicates */
-        
+
             if (!condition)
-              { 
+              {
               slotB = slotA;
               for (i++; i < md->name_count; i++)
                 {
@@ -895,46 +895,46 @@
                   {
                   condition = GET2(slotB, 0) == md->recursive->group_num;
                   if (condition) break;
-                  }    
+                  }
                 else break;
-                }  
-              } 
+                }
+              }
             }
-          }  
-        
+          }
+
         /* Chose branch according to the condition */
-         
+
         ecode += condition? 3 : GET(ecode, 1);
         }
-      }   
+      }


     else if (condcode == OP_CREF || condcode == OP_NCREF)  /* Group used test */
       {
       offset = GET2(ecode, LINK_SIZE+2) << 1;  /* Doubled ref number */
       condition = offset < offset_top && md->offset_vector[offset] >= 0;
-      
+
       /* If the numbered capture is unset, but the reference was by name,
-      scan the table to see if the name refers to any other numbers, and test 
-      them. The condition is true if any one is set. This is tediously similar 
-      to the code above, but not close enough to try to amalgamate. */ 
-      
+      scan the table to see if the name refers to any other numbers, and test
+      them. The condition is true if any one is set. This is tediously similar
+      to the code above, but not close enough to try to amalgamate. */
+
       if (!condition && condcode == OP_NCREF)
         {
-        int refno = offset >> 1; 
+        int refno = offset >> 1;
         uschar *slotA = md->name_table;
-         
+
         for (i = 0; i < md->name_count; i++)
-          { 
-          if (GET2(slotA, 0) == refno) break; 
+          {
+          if (GET2(slotA, 0) == refno) break;
           slotA += md->name_entry_size;
           }
-           
-        /* Found a name for the number - there can be only one; duplicate names 
-        for different numbers are allowed, but not vice versa. First scan down 
+
+        /* Found a name for the number - there can be only one; duplicate names
+        for different numbers are allowed, but not vice versa. First scan down
         for duplicates. */
-          
+
         if (i < md->name_count)
-          {    
+          {
           uschar *slotB = slotA;
           while (slotB > md->name_table)
             {
@@ -942,17 +942,17 @@
             if (strcmp((char *)slotA + 2, (char *)slotB + 2) == 0)
               {
               offset = GET2(slotB, 0) << 1;
-              condition = offset < offset_top && 
+              condition = offset < offset_top &&
                 md->offset_vector[offset] >= 0;
-              if (condition) break;   
-              }    
+              if (condition) break;
+              }
             else break;
-            } 
-      
+            }
+
           /* Scan up for duplicates */
-      
+
           if (!condition)
-            { 
+            {
             slotB = slotA;
             for (i++; i < md->name_count; i++)
               {
@@ -960,16 +960,16 @@
               if (strcmp((char *)slotA + 2, (char *)slotB + 2) == 0)
                 {
                 offset = GET2(slotB, 0) << 1;
-                condition = offset < offset_top && 
+                condition = offset < offset_top &&
                   md->offset_vector[offset] >= 0;
-                if (condition) break;   
-                }    
+                if (condition) break;
+                }
               else break;
-              } 
-            }   
+              }
+            }
           }
-        }  
-         
+        }
+
       /* Chose branch according to the condition */


       ecode += condition? 3 : GET(ecode, 1);
@@ -1030,15 +1030,15 @@
       ecode += 1 + LINK_SIZE;
       }
     break;
-    


+
     /* Before OP_ACCEPT there may be any number of OP_CLOSE opcodes,
     to close any currently open capturing brackets. */
-    
+
     case OP_CLOSE:
-    number = GET2(ecode, 1); 
+    number = GET2(ecode, 1);
     offset = number << 1;
-      
+
 #ifdef DEBUG
       printf("end bracket %d at *ACCEPT", number);
       printf("\n");
@@ -1053,7 +1053,7 @@
       if (offset_top <= offset) offset_top = offset + 2;
       }
     ecode += 3;
-    break;   
+    break;



     /* End of the pattern, either real or forced. If we are in a top-level
@@ -1069,7 +1069,7 @@
       md->recursive = rec->prevrec;
       memmove(md->offset_vector, rec->offset_save,
         rec->saved_max * sizeof(int));
-      offset_top = rec->offset_top;   
+      offset_top = rec->save_offset_top;
       mstart = rec->save_start;
       ims = original_ims;
       ecode = rec->after_call;
@@ -1261,7 +1261,7 @@
       memcpy(new_recursive.offset_save, md->offset_vector,
             new_recursive.saved_max * sizeof(int));
       new_recursive.save_start = mstart;
-      new_recursive.offset_top = offset_top; 
+      new_recursive.save_offset_top = offset_top;
       mstart = eptr;


       /* OK, now we can do the recursion. For each top-level alternative we
@@ -1460,7 +1460,7 @@
       {
       number = GET2(prev, 1+LINK_SIZE);
       offset = number << 1;
-      
+
 #ifdef DEBUG
       printf("end bracket %d", number);
       printf("\n");
@@ -1486,7 +1486,7 @@
         mstart = rec->save_start;
         memcpy(md->offset_vector, rec->offset_save,
           rec->saved_max * sizeof(int));
-        offset_top = rec->offset_top;   
+        offset_top = rec->save_offset_top;
         ecode = rec->after_call;
         ims = original_ims;
         break;
@@ -5010,7 +5010,7 @@
    (offsets == NULL && offsetcount > 0)) return PCRE_ERROR_NULL;
 if (offsetcount < 0) return PCRE_ERROR_BADCOUNT;


-/* This information is for finding all the numbers associated with a given
+/* This information is for finding all the numbers associated with a given
name, for condition testing. */

md->name_table = (uschar *)re + re->name_table_offset;
@@ -5375,24 +5375,24 @@
/* Restore fudged end_subject */

   end_subject = save_end_subject;
-  
-  /* The following two optimizations are disabled for partial matching or if 
+
+  /* The following two optimizations are disabled for partial matching or if
   disabling is explicitly requested. */
-  
-  if ((options & PCRE_NO_START_OPTIMIZE) == 0 && !md->partial) 
-    { 
+
+  if ((options & PCRE_NO_START_OPTIMIZE) == 0 && !md->partial)
+    {
     /* If the pattern was studied, a minimum subject length may be set. This is
     a lower bound; no actual string of that length may actually match the
     pattern. Although the value is, strictly, in characters, we treat it as
     bytes to avoid spending too much time in this optimization. */
-    
+
     if (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0 &&
         end_subject - start_match < study->minlength)
       {
       rc = MATCH_NOMATCH;
-      break;     
+      break;
       }
- 
+
     /* If req_byte is set, we know that that character must appear in the
     subject for the match to succeed. If the first character is set, req_byte
     must be later in the subject; otherwise the test starts at the match point.
@@ -5400,20 +5400,20 @@
     nested unlimited repeats that aren't going to match. Writing separate code
     for cased/caseless versions makes it go faster, as does using an
     autoincrement and backing off on a match.
-    
+
     HOWEVER: when the subject string is very, very long, searching to its end
     can take a long time, and give bad performance on quite ordinary patterns.
     This showed up when somebody was matching something like /^\d+C/ on a
     32-megabyte string... so we don't do this when the string is sufficiently
     long. */
-    
+
     if (req_byte >= 0 && end_subject - start_match < REQ_BYTE_MAX)
       {
       register USPTR p = start_match + ((first_byte >= 0)? 1 : 0);
-    
+
       /* We don't need to repeat the search if we haven't yet reached the
       place we found it at last time. */
-    
+
       if (p > req_byte_ptr)
         {
         if (req_byte_caseless)
@@ -5431,24 +5431,24 @@
             if (*p++ == req_byte) { p--; break; }
             }
           }
-    
+
         /* If we can't find the required character, break the matching loop,
         forcing a match failure. */
-    
+
         if (p >= end_subject)
           {
           rc = MATCH_NOMATCH;
           break;
           }
-    
+
         /* If we have found the required character, save the point where we
         found it, so that we don't search again next time round the loop if
         the start hasn't passed this character yet. */
-    
+
         req_byte_ptr = p;
         }
       }
-    }   
+    }


#ifdef DEBUG /* Sigh. Some compilers never learn. */
printf(">>>> Match against: ");
@@ -5575,7 +5575,7 @@
too many to fit into the vector. */

rc = md->offset_overflow? 0 : md->end_offset_top/2;
-
+
/* If there is space, set up the whole thing as substring 0. The value of
md->start_match_ptr might be modified if \K was encountered on the success
matching path. */

Modified: code/trunk/pcre_fullinfo.c
===================================================================
--- code/trunk/pcre_fullinfo.c    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcre_fullinfo.c    2009-10-05 10:59:35 UTC (rev 461)
@@ -122,12 +122,12 @@
     (study != NULL && (study->flags & PCRE_STUDY_MAPPED) != 0)?
       ((const pcre_study_data *)extra_data->study_data)->start_bits : NULL;
   break;
-  
+
   case PCRE_INFO_MINLENGTH:
   *((int *)where) =
     (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0)?
       study->minlength : -1;
-  break;         
+  break;


case PCRE_INFO_LASTLITERAL:
*((int *)where) =
@@ -152,7 +152,7 @@

   /* From release 8.00 this will always return TRUE because NOPARTIAL is
   no longer ever set (the restrictions have been removed). */
-    
+
   case PCRE_INFO_OKPARTIAL:
   *((int *)where) = (re->flags & PCRE_NOPARTIAL) == 0;
   break;


Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcre_internal.h    2009-10-05 10:59:35 UTC (rev 461)
@@ -1348,7 +1348,7 @@
   OP_SCOND,          /* 99 Conditional group, check empty */


   /* The next two pairs must (respectively) be kept together. */
-   
+
   OP_CREF,           /* 100 Used to hold a capture number as condition */
   OP_NCREF,          /* 101 Same, but generaged by a name reference*/
   OP_RREF,           /* 102 Used to hold a recursion number as condition */
@@ -1588,7 +1588,7 @@
   USPTR save_start;             /* Old value of mstart */
   int *offset_save;             /* Pointer to start of saved offsets */
   int saved_max;                /* Number of saved offsets */
-  int offset_top;               /* Current value of offset_top */
+  int save_offset_top;          /* Current value of offset_top */
 } recursion_info;


 /* Structure for building a chain of data for holding the values of the subject
@@ -1615,7 +1615,7 @@
   int    nllen;                 /* Newline string length */
   int    name_count;            /* Number of names in name table */
   int    name_entry_size;       /* Size of entry in names table */
-  uschar *name_table;           /* Table of names */  
+  uschar *name_table;           /* Table of names */
   uschar nl[4];                 /* Newline string when fixed */
   const uschar *lcc;            /* Points to lower casing table */
   const uschar *ctypes;         /* Points to table of type maps */


Modified: code/trunk/pcre_printint.src
===================================================================
--- code/trunk/pcre_printint.src    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcre_printint.src    2009-10-05 10:59:35 UTC (rev 461)
@@ -245,13 +245,13 @@
       else fprintf(f, "    ");
     fprintf(f, "%s", OP_names[*code]);
     break;
-    
+
     case OP_CLOSE:
     fprintf(f, "    %s %d", OP_names[*code], GET2(code, 1));
-    break;   
+    break;


     case OP_CREF:
-    case OP_NCREF: 
+    case OP_NCREF:
     fprintf(f, "%3d %s", GET2(code,1), OP_names[*code]);
     break;



Modified: code/trunk/pcre_study.c
===================================================================
--- code/trunk/pcre_study.c    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcre_study.c    2009-10-05 10:59:35 UTC (rev 461)
@@ -60,18 +60,18 @@
 *************************************************/


/* Scan a parenthesized group and compute the minimum length of subject that
-is needed to match it. This is a lower bound; it does not mean there is a
+is needed to match it. This is a lower bound; it does not mean there is a
string of that length that matches. In UTF8 mode, the result is in characters
rather than bytes.

 Arguments:
   code       pointer to start of group (the bracket)
   startcode  pointer to start of the whole pattern
-  options    the compiling options 
+  options    the compiling options


 Returns:   the minimum length
            -1 if \C was encountered
-           -2 internal error (missing capturing bracket) 
+           -2 internal error (missing capturing bracket)
 */


 static int
@@ -91,18 +91,18 @@
 for (;;)
   {
   int d, min;
-  uschar *cs, *ce; 
+  uschar *cs, *ce;
   register int op = *cc;
-  
+
   switch (op)
     {
     case OP_CBRA:
-    case OP_SCBRA: 
+    case OP_SCBRA:
     case OP_BRA:
-    case OP_SBRA: 
+    case OP_SBRA:
     case OP_ONCE:
     case OP_COND:
-    case OP_SCOND: 
+    case OP_SCOND:
     d = find_minlength(cc, startcode, options);
     if (d < 0) return d;
     branchlength += d;
@@ -119,12 +119,12 @@
     case OP_KETRMAX:
     case OP_KETRMIN:
     case OP_END:
-    if (length < 0 || (!had_recurse && branchlength < length)) 
+    if (length < 0 || (!had_recurse && branchlength < length))
       length = branchlength;
     if (*cc != OP_ALT) return length;
     cc += 1 + LINK_SIZE;
     branchlength = 0;
-    had_recurse = FALSE;   
+    had_recurse = FALSE;
     break;


     /* Skip over assertive subpatterns */
@@ -156,11 +156,11 @@
     case OP_WORD_BOUNDARY:
     cc += _pcre_OP_lengths[*cc];
     break;
-    
+
     /* Skip over a subpattern that has a {0} or {0,x} quantifier */


     case OP_BRAZERO:
-    case OP_BRAMINZERO: 
+    case OP_BRAMINZERO:
     case OP_SKIPZERO:
     cc += _pcre_OP_lengths[*cc];
     do cc += GET(cc, 1); while (*cc == OP_ALT);
@@ -184,10 +184,10 @@
     if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
 #endif
     break;
-    
+
     case OP_TYPEPLUS:
     case OP_TYPEMINPLUS:
-    case OP_TYPEPOSPLUS:      
+    case OP_TYPEPOSPLUS:
     branchlength++;
     cc += (cc[1] == OP_PROP || cc[1] == OP_NOTPROP)? 4 : 2;
     break;
@@ -196,7 +196,7 @@
     need to skip over a multibyte character in UTF8 mode.  */


     case OP_EXACT:
-    case OP_NOTEXACT: 
+    case OP_NOTEXACT:
     branchlength += GET2(cc,1);
     cc += 4;
 #ifdef SUPPORT_UTF8
@@ -225,20 +225,20 @@
     case OP_ANY:
     case OP_ALLANY:
     case OP_EXTUNI:
-    case OP_HSPACE:  
+    case OP_HSPACE:
     case OP_NOT_HSPACE:
     case OP_VSPACE:
-    case OP_NOT_VSPACE:   
+    case OP_NOT_VSPACE:
     branchlength++;
     cc++;
     break;
-    
+
     /* "Any newline" might match two characters */
-    
+
     case OP_ANYNL:
     branchlength += 2;
     cc++;
-    break;     
+    break;


     /* The single-byte matcher means we can't proceed in UTF-8 mode */


@@ -248,7 +248,7 @@
 #endif
     branchlength++;
     cc++;
-    break;  
+    break;


     /* For repeated character types, we have to test for \p and \P, which have
     an extra two bytes of parameters. */
@@ -287,35 +287,35 @@
       case OP_CRPLUS:
       case OP_CRMINPLUS:
       branchlength++;
-      /* Fall through */ 
+      /* Fall through */


       case OP_CRSTAR:
       case OP_CRMINSTAR:
       case OP_CRQUERY:
       case OP_CRMINQUERY:
-      cc++; 
+      cc++;
       break;
-      
+
       case OP_CRRANGE:
       case OP_CRMINRANGE:
       branchlength += GET2(cc,1);
       cc += 5;
       break;
-      
+
       default:
       branchlength++;
-      break;   
+      break;
       }
     break;
-    
-    /* Backreferences and subroutine calls are treated in the same way: we find 
-    the minimum length for the subpattern. A recursion, however, causes an 
+
+    /* Backreferences and subroutine calls are treated in the same way: we find
+    the minimum length for the subpattern. A recursion, however, causes an
     a flag to be set that causes the length of this branch to be ignored. The
     logic is that a recursion can only make sense if there is another
     alternation that stops the recursing. That will provide the minimum length
     (when no recursion happens). A backreference within the group that it is
     referencing behaves in the same way. */
-    
+
     case OP_REF:
     ce = cs = (uschar *)_pcre_find_bracket(startcode, utf8, GET2(cc, 1));
     if (cs == NULL) return -2;
@@ -323,13 +323,13 @@
     if (cc > cs && cc < ce)
       {
       d = 0;
-      had_recurse = TRUE; 
-      }  
+      had_recurse = TRUE;
+      }
     else d = find_minlength(cs, startcode, options);
-    cc += 3; 
+    cc += 3;


     /* Handle repeated back references */
-     
+
     switch (*cc)
       {
       case OP_CRSTAR:
@@ -339,61 +339,61 @@
       min = 0;
       cc++;
       break;
-       
+
       case OP_CRRANGE:
       case OP_CRMINRANGE:
       min = GET2(cc, 1);
       cc += 5;
       break;
-      
+
       default:
       min = 1;
       break;
       }


     branchlength += min * d;
-    break; 
+    break;


-    case OP_RECURSE:  
+    case OP_RECURSE:
     cs = ce = (uschar *)startcode + GET(cc, 1);
     if (cs == NULL) return -2;
     do ce += GET(ce, 1); while (*ce == OP_ALT);
     if (cc > cs && cc < ce)
-      had_recurse = TRUE; 
-    else 
+      had_recurse = TRUE;
+    else
       branchlength += find_minlength(cs, startcode, options);
     cc += 1 + LINK_SIZE;
     break;


     /* Anything else does not or need not match a character. We can get the
-    item's length from the table, but for those that can match zero occurrences 
-    of a character, we must take special action for UTF-8 characters. */ 
-     
+    item's length from the table, but for those that can match zero occurrences
+    of a character, we must take special action for UTF-8 characters. */
+
     case OP_UPTO:
-    case OP_NOTUPTO: 
+    case OP_NOTUPTO:
     case OP_MINUPTO:
-    case OP_NOTMINUPTO: 
+    case OP_NOTMINUPTO:
     case OP_POSUPTO:
     case OP_STAR:
     case OP_MINSTAR:
-    case OP_NOTMINSTAR: 
+    case OP_NOTMINSTAR:
     case OP_POSSTAR:
-    case OP_NOTPOSSTAR: 
+    case OP_NOTPOSSTAR:
     case OP_QUERY:
     case OP_MINQUERY:
     case OP_NOTMINQUERY:
     case OP_POSQUERY:
-    case OP_NOTPOSQUERY: 
+    case OP_NOTPOSQUERY:
     cc += _pcre_OP_lengths[op];
-#ifdef SUPPORT_UTF8     
+#ifdef SUPPORT_UTF8
     if (utf8 && cc[-1] >= 0xc0) cc += _pcre_utf8_table4[cc[-1] & 0x3f];
-#endif 
+#endif
     break;


     /* For the record, these are the opcodes that are matched by "default":
     OP_ACCEPT, OP_CLOSE, OP_COMMIT, OP_FAIL, OP_PRUNE, OP_SET_SOM, OP_SKIP,
     OP_THEN. */
-     
+
     default:
     cc += _pcre_OP_lengths[op];
     break;
@@ -885,32 +885,32 @@
   (re->name_count * re->name_entry_size);


/* For an anchored pattern, or an unanchored pattern that has a first char, or
-a multiline pattern that matches only at "line starts", there is no point in
+a multiline pattern that matches only at "line starts", there is no point in
seeking a list of starting bytes. */

 if ((re->options & PCRE_ANCHORED) == 0 &&
     (re->flags & (PCRE_FIRSTSET|PCRE_STARTLINE)) == 0)
   {
   /* Set the character tables in the block that is passed around */
-  
+
   tables = re->tables;
   if (tables == NULL)
     (void)pcre_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES,
     (void *)(&tables));
-  
+
   compile_block.lcc = tables + lcc_offset;
   compile_block.fcc = tables + fcc_offset;
   compile_block.cbits = tables + cbits_offset;
   compile_block.ctypes = tables + ctypes_offset;
-  
+
   /* See if we can find a fixed set of initial characters for the pattern. */
-  
+
   memset(start_bits, 0, 32 * sizeof(uschar));
-  bits_set = set_start_bits(code, start_bits, 
-    (re->options & PCRE_CASELESS) != 0, (re->options & PCRE_UTF8) != 0, 
+  bits_set = set_start_bits(code, start_bits,
+    (re->options & PCRE_CASELESS) != 0, (re->options & PCRE_UTF8) != 0,
     &compile_block) == SSB_DONE;
   }
-   
+
 /* Find the minimum length of subject string. */


 min = find_minlength(code, code, re->options);
@@ -947,12 +947,12 @@
   study->flags |= PCRE_STUDY_MAPPED;
   memcpy(study->start_bits, start_bits, sizeof(start_bits));
   }
-  
+
 if (min >= 0)
   {
   study->flags |= PCRE_STUDY_MINLEN;
   study->minlength = min;
-  }     
+  }


return extra;
}

Modified: code/trunk/pcre_try_flipped.c
===================================================================
--- code/trunk/pcre_try_flipped.c    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcre_try_flipped.c    2009-10-05 10:59:35 UTC (rev 461)
@@ -129,7 +129,7 @@
   *internal_study = *study;   /* To copy other fields */
   internal_study->size = byteflip(study->size, sizeof(study->size));
   internal_study->flags = byteflip(study->flags, sizeof(study->flags));
-  internal_study->minlength = byteflip(study->minlength, 
+  internal_study->minlength = byteflip(study->minlength,
     sizeof(study->minlength));
   }



Modified: code/trunk/pcregrep.c
===================================================================
--- code/trunk/pcregrep.c    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcregrep.c    2009-10-05 10:59:35 UTC (rev 461)
@@ -1367,11 +1367,11 @@
 if (count_only)
   {
   if (count > 0 || !omit_zero_count)
-    { 
-    if (printname != NULL && filenames != FN_NONE) 
+    {
+    if (printname != NULL && filenames != FN_NONE)
       fprintf(stdout, "%s:", printname);
     fprintf(stdout, "%d\n", count);
-    } 
+    }
   }


 return rc;
@@ -1936,9 +1936,9 @@
       {
       char *opbra = strchr(op->long_name, '(');
       char *equals = strchr(op->long_name, '=');
- 
+
       /* Handle options with only one spelling of the name */
- 
+
       if (opbra == NULL)     /* Does not contain '(' */
         {
         if (equals == NULL)  /* Not thing=data case */
@@ -1961,36 +1961,36 @@
             }
           }
         }
-        
+
       /* Handle options with an alternate spelling of the name */
- 
-      else 
+
+      else
         {
         char buff1[24];
         char buff2[24];
-         
+
         int baselen = opbra - op->long_name;
         int fulllen = strchr(op->long_name, ')') - op->long_name + 1;
-        int arglen = (argequals == NULL || equals == NULL)? 
+        int arglen = (argequals == NULL || equals == NULL)?
           (int)strlen(arg) : argequals - arg;
- 
+
         sprintf(buff1, "%.*s", baselen, op->long_name);
         sprintf(buff2, "%s%.*s", buff1, fulllen - baselen - 2, opbra + 1);
-         
-        if (strncmp(arg, buff1, arglen) == 0 || 
+
+        if (strncmp(arg, buff1, arglen) == 0 ||
            strncmp(arg, buff2, arglen) == 0)
           {
           if (equals != NULL && argequals != NULL)
             {
-            option_data = argequals; 
+            option_data = argequals;
             if (*option_data == '=')
               {
-              option_data++;  
+              option_data++;
               longopwasequals = TRUE;
-              } 
-            }  
+              }
+            }
           break;
-          } 
+          }
         }
       }



Modified: code/trunk/pcreposix.c
===================================================================
--- code/trunk/pcreposix.c    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcreposix.c    2009-10-05 10:59:35 UTC (rev 461)
@@ -70,80 +70,80 @@
   REG_EESCAPE, /* \c at end of pattern */
   REG_EESCAPE, /* unrecognized character follows \ */
   REG_BADBR,   /* numbers out of order in {} quantifier */
-  /* 5 */ 
+  /* 5 */
   REG_BADBR,   /* number too big in {} quantifier */
   REG_EBRACK,  /* missing terminating ] for character class */
   REG_ECTYPE,  /* invalid escape sequence in character class */
   REG_ERANGE,  /* range out of order in character class */
   REG_BADRPT,  /* nothing to repeat */
-  /* 10 */ 
+  /* 10 */
   REG_BADRPT,  /* operand of unlimited repeat could match the empty string */
   REG_ASSERT,  /* internal error: unexpected repeat */
   REG_BADPAT,  /* unrecognized character after (? */
   REG_BADPAT,  /* POSIX named classes are supported only within a class */
   REG_EPAREN,  /* missing ) */
-  /* 15 */ 
+  /* 15 */
   REG_ESUBREG, /* reference to non-existent subpattern */
   REG_INVARG,  /* erroffset passed as NULL */
   REG_INVARG,  /* unknown option bit(s) set */
   REG_EPAREN,  /* missing ) after comment */
   REG_ESIZE,   /* parentheses nested too deeply */
-  /* 20 */ 
+  /* 20 */
   REG_ESIZE,   /* regular expression too large */
   REG_ESPACE,  /* failed to get memory */
   REG_EPAREN,  /* unmatched parentheses */
   REG_ASSERT,  /* internal error: code overflow */
   REG_BADPAT,  /* unrecognized character after (?< */
-  /* 25 */ 
+  /* 25 */
   REG_BADPAT,  /* lookbehind assertion is not fixed length */
   REG_BADPAT,  /* malformed number or name after (?( */
   REG_BADPAT,  /* conditional group contains more than two branches */
   REG_BADPAT,  /* assertion expected after (?( */
   REG_BADPAT,  /* (?R or (?[+-]digits must be followed by ) */
-  /* 30 */ 
+  /* 30 */
   REG_ECTYPE,  /* unknown POSIX class name */
   REG_BADPAT,  /* POSIX collating elements are not supported */
   REG_INVARG,  /* this version of PCRE is not compiled with PCRE_UTF8 support */
   REG_BADPAT,  /* spare error */
   REG_BADPAT,  /* character value in \x{...} sequence is too large */
-  /* 35 */ 
+  /* 35 */
   REG_BADPAT,  /* invalid condition (?(0) */
   REG_BADPAT,  /* \C not allowed in lookbehind assertion */
   REG_EESCAPE, /* PCRE does not support \L, \l, \N, \U, or \u */
   REG_BADPAT,  /* number after (?C is > 255 */
   REG_BADPAT,  /* closing ) for (?C expected */
-  /* 40 */ 
+  /* 40 */
   REG_BADPAT,  /* recursive call could loop indefinitely */
   REG_BADPAT,  /* unrecognized character after (?P */
   REG_BADPAT,  /* syntax error in subpattern name (missing terminator) */
   REG_BADPAT,  /* two named subpatterns have the same name */
   REG_BADPAT,  /* invalid UTF-8 string */
-  /* 45 */ 
+  /* 45 */
   REG_BADPAT,  /* support for \P, \p, and \X has not been compiled */
   REG_BADPAT,  /* malformed \P or \p sequence */
   REG_BADPAT,  /* unknown property name after \P or \p */
   REG_BADPAT,  /* subpattern name is too long (maximum 32 characters) */
   REG_BADPAT,  /* too many named subpatterns (maximum 10,000) */
-  /* 50 */ 
+  /* 50 */
   REG_BADPAT,  /* repeated subpattern is too long */
   REG_BADPAT,  /* octal value is greater than \377 (not in UTF-8 mode) */
   REG_BADPAT,  /* internal error: overran compiling workspace */
   REG_BADPAT,  /* internal error: previously-checked referenced subpattern not found */
   REG_BADPAT,  /* DEFINE group contains more than one branch */
-  /* 55 */ 
+  /* 55 */
   REG_BADPAT,  /* repeating a DEFINE group is not allowed */
   REG_INVARG,  /* inconsistent NEWLINE options */
   REG_BADPAT,  /* \g is not followed followed by an (optionally braced) non-zero number */
   REG_BADPAT,  /* a numbered reference must not be zero */
   REG_BADPAT,  /* (*VERB) with an argument is not supported */
   /* 60 */
-  REG_BADPAT,  /* (*VERB) not recognized */ 
+  REG_BADPAT,  /* (*VERB) not recognized */
   REG_BADPAT,  /* number is too big */
   REG_BADPAT,  /* subpattern name expected */
   REG_BADPAT,  /* digit expected after (?+ */
   REG_BADPAT,  /* ] is an invalid data character in JavaScript compatibility mode */
   /* 65 */
-  REG_BADPAT   /* different names for subpatterns of the same number are not allowed */ 
+  REG_BADPAT   /* different names for subpatterns of the same number are not allowed */
 };


/* Table of texts corresponding to POSIX error codes */
@@ -253,14 +253,14 @@
&erroffset, NULL);
preg->re_erroffset = erroffset;

-/* Safety: if the error code is too big for the translation vector (which
+/* Safety: if the error code is too big for the translation vector (which
should not happen, but we all make mistakes), return REG_BADPAT. */

-if (preg->re_pcre == NULL) 
+if (preg->re_pcre == NULL)
   {
   return (errorcode < sizeof(eint)/sizeof(const int))?
     eint[errorcode] : REG_BADPAT;
-  } 
+  }


preg->re_nsub = pcre_info((const pcre *)preg->re_pcre, NULL, NULL);
return 0;
@@ -302,7 +302,7 @@

((regex_t *)preg)->re_erroffset = (size_t)(-1); /* Only has meaning after compile */

-/* When no string data is being returned, or no vector has been passed in which
+/* When no string data is being returned, or no vector has been passed in which
to put it, ensure that nmatch is zero. Otherwise, ensure the vector for holding
the return data is large enough. */


Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/pcretest.c    2009-10-05 10:59:35 UTC (rev 461)
@@ -1305,7 +1305,7 @@
     if ((options & PCRE_DOTALL) != 0) cflags |= REG_DOTALL;
     if ((options & PCRE_NO_AUTO_CAPTURE) != 0) cflags |= REG_NOSUB;
     if ((options & PCRE_UTF8) != 0) cflags |= REG_UTF8;
-    if ((options & PCRE_UNGREEDY) != 0) cflags |= REG_UNGREEDY; 
+    if ((options & PCRE_UNGREEDY) != 0) cflags |= REG_UNGREEDY;


     rc = regcomp(&preg, (char *)p, cflags);


@@ -1630,10 +1630,10 @@
           {
           uschar *start_bits = NULL;
           int minlength;
-           
+
           new_info(re, extra, PCRE_INFO_MINLENGTH, &minlength);
-          fprintf(outfile, "Subject length lower bound = %d\n", minlength);  
- 
+          fprintf(outfile, "Subject length lower bound = %d\n", minlength);
+
           new_info(re, extra, PCRE_INFO_FIRSTTABLE, &start_bits);
           if (start_bits == NULL)
             fprintf(outfile, "No set of starting bytes\n");
@@ -1977,7 +1977,7 @@
         case 'N':
         if ((options & PCRE_NOTEMPTY) != 0)
           options = (options & ~PCRE_NOTEMPTY) | PCRE_NOTEMPTY_ATSTART;
-        else    
+        else
           options |= PCRE_NOTEMPTY;
         continue;


@@ -2001,7 +2001,7 @@
         continue;


         case 'P':
-        options |= ((options & PCRE_PARTIAL_SOFT) == 0)? 
+        options |= ((options & PCRE_PARTIAL_SOFT) == 0)?
           PCRE_PARTIAL_SOFT : PCRE_PARTIAL_HARD;
         continue;


@@ -2377,8 +2377,8 @@
           {
           fprintf(outfile, ": ");
           pchars(bptr + use_offsets[0], use_offsets[1] - use_offsets[0],
-            outfile);   
-          }   
+            outfile);
+          }
         fprintf(outfile, "\n");
         break;  /* Out of the /g loop */
         }


Modified: code/trunk/perltest.pl
===================================================================
--- code/trunk/perltest.pl    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/perltest.pl    2009-10-05 10:59:35 UTC (rev 461)
@@ -90,11 +90,11 @@
   # Remove /8 from a UTF-8 pattern.


$utf8 = $pattern =~ s/8(?=[a-z]*$)//;
-
+
# Remove /J from a pattern with duplicate names.
-
- $pattern =~ s/J(?=[a-z]*$)//;

+ $pattern =~ s/J(?=[a-z]*$)//;
+
# Check that the pattern is valid

eval "\$_ =~ ${pattern}";

Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/testdata/testinput2    2009-10-05 10:59:35 UTC (rev 461)
@@ -3113,14 +3113,14 @@
     b"11111
     a"11111 


-/^(?|(a)(b)(c)(?<D>d)|(?<D>e)) (?('D')X|Y)/JDx
+/^(?|(a)(b)(c)(?<D>d)|(?<D>e)) (?('D')X|Y)/JDZx
     abcdX
     eX
     ** Failers
     abcdY
     ey     


-/(?<A>a) (b)(c)  (?<A>d  (?(R&A)$ | (?4)) )/JDx
+/(?<A>a) (b)(c)  (?<A>d  (?(R&A)$ | (?4)) )/JDZx
     abcdd
     ** Failers
     abcdde  


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2009-10-04 09:27:20 UTC (rev 460)
+++ code/trunk/testdata/testoutput2    2009-10-05 10:59:35 UTC (rev 461)
@@ -10274,36 +10274,36 @@
     a"11111 
 No match


-/^(?|(a)(b)(c)(?<D>d)|(?<D>e)) (?('D')X|Y)/JDx
+/^(?|(a)(b)(c)(?<D>d)|(?<D>e)) (?('D')X|Y)/JDZx
 ------------------------------------------------------------------
-  0  79 Bra
-  3     ^
-  4  43 Bra
-  7   7 CBra 1
- 12     a
- 14   7 Ket
- 17   7 CBra 2
- 22     b
- 24   7 Ket
- 27   7 CBra 3
- 32     c
- 34   7 Ket
- 37   7 CBra 4
- 42     d
- 44   7 Ket
- 47  13 Alt
- 50   7 CBra 1
- 55     e
- 57   7 Ket
- 60  56 Ket
- 63   8 Cond
- 66   4 Cond nref
- 69     X
- 71   5 Alt
- 74     Y
- 76  13 Ket
- 79  79 Ket
- 82     End
+        Bra
+        ^
+        Bra
+        CBra 1
+        a
+        Ket
+        CBra 2
+        b
+        Ket
+        CBra 3
+        c
+        Ket
+        CBra 4
+        d
+        Ket
+        Alt
+        CBra 1
+        e
+        Ket
+        Ket
+        Cond
+      4 Cond nref
+        X
+        Alt
+        Y
+        Ket
+        Ket
+        End
 ------------------------------------------------------------------
 Capturing subpattern count = 4
 Named capturing subpatterns:
@@ -10328,31 +10328,31 @@
     ey     
 No match


-/(?<A>a) (b)(c)  (?<A>d  (?(R&A)$ | (?4)) )/JDx
+/(?<A>a) (b)(c)  (?<A>d  (?(R&A)$ | (?4)) )/JDZx
 ------------------------------------------------------------------
-  0  65 Bra
-  3   7 CBra 1
-  8     a
- 10   7 Ket
- 13   7 CBra 2
- 18     b
- 20   7 Ket
- 23   7 CBra 3
- 28     c
- 30   7 Ket
- 33  29 CBra 4
- 38     d
- 40   7 Cond
- 43     Cond nrecurse 1
- 46     $
- 47  12 Alt
- 50   6 Once
- 53  33 Recurse
- 56   6 Ket
- 59  19 Ket
- 62  29 Ket
- 65  65 Ket
- 68     End
+        Bra
+        CBra 1
+        a
+        Ket
+        CBra 2
+        b
+        Ket
+        CBra 3
+        c
+        Ket
+        CBra 4
+        d
+        Cond
+        Cond nrecurse 1
+        $
+        Alt
+        Once
+        Recurse
+        Ket
+        Ket
+        Ket
+        Ket
+        End
 ------------------------------------------------------------------
 Capturing subpattern count = 4
 Named capturing subpatterns: