[Pcre-svn] [507] code/trunk: Tidies for 8.02-RC1 release.

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [507] code/trunk: Tidies for 8.02-RC1 release.
Revision: 507
          http://vcs.pcre.org/viewvc?view=rev&revision=507
Author:   ph10
Date:     2010-03-10 16:08:01 +0000 (Wed, 10 Mar 2010)


Log Message:
-----------
Tidies for 8.02-RC1 release.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/NEWS
    code/trunk/RunTest
    code/trunk/configure.ac
    code/trunk/doc/html/pcre.html
    code/trunk/doc/html/pcrepattern.html
    code/trunk/doc/html/pcreperform.html
    code/trunk/doc/html/pcresyntax.html
    code/trunk/doc/pcre.txt
    code/trunk/doc/pcrepattern.3
    code/trunk/maint/README
    code/trunk/pcre_compile.c
    code/trunk/pcre_dfa_exec.c
    code/trunk/pcre_globals.c
    code/trunk/pcre_internal.h
    code/trunk/pcre_tables.c
    code/trunk/pcreposix.c
    code/trunk/pcretest.c
    code/trunk/testdata/testinput12
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput12
    code/trunk/testdata/testoutput6


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/ChangeLog    2010-03-10 16:08:01 UTC (rev 507)
@@ -1,71 +1,71 @@
 ChangeLog for PCRE
 ------------------


-Version 8.02 01-Mar-2010
+Version 8.02 10-Mar-2010
------------------------

1. The Unicode data tables have been updated to Unicode 5.2.0.

 2.  Added the option --libs-cpp to pcre-config, but only when C++ support is
     configured.
-    
+
 3.  Updated the licensing terms in the pcregexp.pas file, as agreed with the
-    original author of that file, following a query about its status. 
-    
-4.  On systems that do not have stdint.h (e.g. Solaris), check for and include 
+    original author of that file, following a query about its status.
+
+4.  On systems that do not have stdint.h (e.g. Solaris), check for and include
     inttypes.h instead. This fixes a bug that was introduced by change 8.01/8.
-    
+
 5.  A pattern such as (?&t)*+(?(DEFINE)(?<t>.)) which has a possessive
     quantifier applied to a forward-referencing subroutine call, could compile
     incorrect code or give the error "internal error: previously-checked
     referenced subpattern not found".
-    
-6.  Both MS Visual Studio and Symbian OS have problems with initializing 
+
+6.  Both MS Visual Studio and Symbian OS have problems with initializing
     variables to point to external functions. For these systems, therefore,
     pcre_malloc etc. are now initialized to local functions that call the
     relevant global functions.
-    
+
 7.  There were two entries missing in the vectors called coptable and poptable
     in pcre_dfa_exec.c. This could lead to memory accesses outsize the vectors.
-    I've fixed the data, and added a kludgy way of testing at compile time that 
-    the lengths are correct (equal to the number of opcodes).  
-    
-8.  Following on from 7, I added a similar kludge to check the length of the 
-    eint vector in pcreposix.c. 
-    
-9.  Error texts for pcre_compile() are held as one long string to avoid too 
-    much relocation at load time. To find a text, the string is searched, 
+    I've fixed the data, and added a kludgy way of testing at compile time that
+    the lengths are correct (equal to the number of opcodes).
+
+8.  Following on from 7, I added a similar kludge to check the length of the
+    eint vector in pcreposix.c.
+
+9.  Error texts for pcre_compile() are held as one long string to avoid too
+    much relocation at load time. To find a text, the string is searched,
     counting zeros. There was no check for running off the end of the string,
     which could happen if a new error number was added without updating the
-    string. 
-    
-10. \K gave a compile-time error if it appeared in a lookbehind assersion. 
-    
+    string.
+
+10. \K gave a compile-time error if it appeared in a lookbehind assersion.
+
 11. \K was not working if it appeared in an atomic group or in a group that
-    was called as a "subroutine", or in an assertion. Perl 5.11 documents that 
-    \K is "not well defined" if used in an assertion. PCRE now accepts it if 
-    the assertion is positive, but not if it is negative. 
-    
+    was called as a "subroutine", or in an assertion. Perl 5.11 documents that
+    \K is "not well defined" if used in an assertion. PCRE now accepts it if
+    the assertion is positive, but not if it is negative.
+
 12. Change 11 fortuitously reduced the size of the stack frame used in the
-    "match()" function of pcre_exec.c by one pointer. Forthcoming 
-    implementation of support for (*MARK) will need an extra pointer on the 
+    "match()" function of pcre_exec.c by one pointer. Forthcoming
+    implementation of support for (*MARK) will need an extra pointer on the
     stack; I have reserved it now, so that the stack frame size does not
     decrease.
-    
-13. A pattern such as (?P<L1>(?P<L2>0)|(?P>L2)(?P>L1)) in which the only other 
-    item in branch that calls a recursion is a subroutine call - as in the 
+
+13. A pattern such as (?P<L1>(?P<L2>0)|(?P>L2)(?P>L1)) in which the only other
+    item in branch that calls a recursion is a subroutine call - as in the
     second branch in the above example - was incorrectly given the compile-
     time error "recursive call could loop indefinitely" because pcre_compile()
-    was not correctly checking the subroutine for matching a non-empty string. 
-    
+    was not correctly checking the subroutine for matching a non-empty string.
+
 14. The checks for overrunning compiling workspace could trigger after an
-    overrun had occurred. This is a "should never occur" error, but it can be 
+    overrun had occurred. This is a "should never occur" error, but it can be
     triggered by pathological patterns such as hundreds of nested parentheses.
-    The checks now trigger 100 bytes before the end of the workspace. 
-    
-15. Fix typo in configure.ac: "srtoq" should be "strtoq". 
+    The checks now trigger 100 bytes before the end of the workspace.


+15. Fix typo in configure.ac: "srtoq" should be "strtoq".

+
Version 8.01 19-Jan-2010
------------------------


Modified: code/trunk/NEWS
===================================================================
--- code/trunk/NEWS    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/NEWS    2010-03-10 16:08:01 UTC (rev 507)
@@ -1,6 +1,12 @@
 News about PCRE releases
 ------------------------


+Release 8.02 10-Mar-2010
+------------------------
+
+Another bug-fix release.
+
+
Release 8.01 19-Jan-2010
------------------------


Modified: code/trunk/RunTest
===================================================================
--- code/trunk/RunTest    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/RunTest    2010-03-10 16:08:01 UTC (rev 507)
@@ -133,10 +133,10 @@
 echo PCRE C library tests
 ./pcretest /dev/null


-# Primary test, Perl-compatible for both 5.8 and 5.10
+# Primary test, compatible with Perl 5.8, 5.10, 5.11

 if [ $do1 = yes ] ; then
-  echo "Test 1: main functionality (Perl 5.8 & 5.10 compatible)"
+  echo "Test 1: main functionality (Perl 5.8, 5.10, 5.11 compatible)"
   $valgrind ./pcretest -q $testdata/testinput1 testtry
   if [ $? = 0 ] ; then
     $cf $testdata/testoutput1 testtry
@@ -215,7 +215,7 @@
 # Additional tests for UTF8 support


 if [ $do4 = yes ] ; then
-  echo "Test 4: UTF-8 support (Perl 5.8 & 5.10 compatible)"
+  echo "Test 4: UTF-8 support (Perl 5.8, 5.10, 5.11 compatible)"
   $valgrind ./pcretest -q $testdata/testinput4 testtry
   if [ $? = 0 ] ; then
     $cf $testdata/testoutput4 testtry
@@ -237,7 +237,7 @@
 fi


 if [ $do6 = yes ] ; then
-  echo "Test 6: Unicode property support (Perl 5.10 compatible)"
+  echo "Test 6: Unicode property support (Perl 5.10, 5.11 compatible)"
   $valgrind ./pcretest -q $testdata/testinput6 testtry
   if [ $? = 0 ] ; then
     $cf $testdata/testoutput6 testtry
@@ -299,10 +299,10 @@
   echo "OK"
 fi


-# Test of Perl 5.10 features
+# Test of Perl 5.10, 5.11 features

 if [ $do11 = yes ] ; then
-  echo "Test 11: Perl 5.10 features"
+  echo "Test 11: Perl 5.10, 5.11 features"
   $valgrind ./pcretest -q $testdata/testinput11 testtry
   if [ $? = 0 ] ; then
     $cf $testdata/testoutput11 testtry


Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/configure.ac    2010-03-10 16:08:01 UTC (rev 507)
@@ -11,7 +11,7 @@
 m4_define(pcre_major, [8])
 m4_define(pcre_minor, [02])
 m4_define(pcre_prerelease, [-RC1])
-m4_define(pcre_date, [2010-03-01])
+m4_define(pcre_date, [2010-03-10])


 # Libtool shared library interface versions (current:revision:age)
 m4_define(libpcre_version, [0:1:0])
@@ -110,7 +110,7 @@
               AS_HELP_STRING([--disable-cpp],
                              [disable C++ support]),
               , enable_cpp=yes)
-AC_SUBST(enable_cpp)              
+AC_SUBST(enable_cpp)


# Handle --enable-rebuild-chartables
AC_ARG_ENABLE(rebuild-chartables,

Modified: code/trunk/doc/html/pcre.html
===================================================================
--- code/trunk/doc/html/pcre.html    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/doc/html/pcre.html    2010-03-10 16:08:01 UTC (rev 507)
@@ -33,7 +33,7 @@
 The current implementation of PCRE corresponds approximately with Perl 5.10,
 including support for UTF-8 encoded strings and Unicode general category
 properties. However, UTF-8 and Unicode support has to be explicitly enabled; it
-is not the default. The Unicode tables correspond to Unicode release 5.1.
+is not the default. The Unicode tables correspond to Unicode release 5.2.0.
 </P>
 <P>
 In addition to the Perl-compatible matching function, PCRE contains an
@@ -298,9 +298,9 @@
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 28 September 2009
+Last updated: 01 March 2010
 <br>
-Copyright &copy; 1997-2009 University of Cambridge.
+Copyright &copy; 1997-2010 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.


Modified: code/trunk/doc/html/pcrepattern.html
===================================================================
--- code/trunk/doc/html/pcrepattern.html    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/doc/html/pcrepattern.html    2010-03-10 16:08:01 UTC (rev 507)
@@ -511,13 +511,17 @@
 <P>
 Arabic,
 Armenian,
+Avestan,
 Balinese,
+Bamum,
 Bengali,
 Bopomofo,
 Braille,
 Buginese,
 Buhid,
 Canadian_Aboriginal,
+Carian,
+Cham,
 Cherokee,
 Common,
 Coptic,
@@ -526,6 +530,7 @@
 Cyrillic,
 Deseret,
 Devanagari,
+Egyptian_Hieroglyphs,
 Ethiopic,
 Georgian,
 Glagolitic,
@@ -538,16 +543,27 @@
 Hanunoo,
 Hebrew,
 Hiragana,
+Imperial_Aramaic,
 Inherited,
+Inscriptional_Pahlavi,
+Inscriptional_Parthian,
+Javanese,
+Kaithi,
 Kannada,
 Katakana,
+Kayah_Li,
 Kharoshthi,
 Khmer,
 Lao,
 Latin,
+Lepcha,
 Limbu,
 Linear_B,
+Lisu,
+Lycian,
+Lydian,
 Malayalam,
+Meetei_Mayek,
 Mongolian,
 Myanmar,
 New_Tai_Lue,
@@ -555,18 +571,27 @@
 Ogham,
 Old_Italic,
 Old_Persian,
+Old_South_Arabian,
+Old_Turkic,
+Ol_Chiki,
 Oriya,
 Osmanya,
 Phags_Pa,
 Phoenician,
+Rejang,
 Runic,
+Samaritan,
+Saurashtra,
 Shavian,
 Sinhala,
+Sundanese,
 Syloti_Nagri,
 Syriac,
 Tagalog,
 Tagbanwa,
 Tai_Le,
+Tai_Tham,
+Tai_Viet,
 Tamil,
 Telugu,
 Thaana,
@@ -574,6 +599,7 @@
 Tibetan,
 Tifinagh,
 Ugaritic,
+Vai,
 Yi.
 </P>
 <P>
@@ -705,6 +731,11 @@
   (foo)\Kbar
 </pre>
 matches "foobar", the first substring is still set to "foo".
+</P>
+<P>
+Perl documents that the use of \K within assertions is "not well defined". In
+PCRE, \K is acted upon when it occurs inside positive assertions, but is
+ignored in negative assertions.
 <a name="smallassertions"></a></P>
 <br><b>
 Simple assertions
@@ -2396,7 +2427,7 @@
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 January 2010
+Last updated: 06 March 2010
 <br>
 Copyright &copy; 1997-2010 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcreperform.html
===================================================================
--- code/trunk/doc/html/pcreperform.html    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/doc/html/pcreperform.html    2010-03-10 16:08:01 UTC (rev 507)
@@ -21,14 +21,15 @@
 of them.
 </P>
 <br><b>
-MEMORY USAGE
+COMPILED PATTERN MEMORY USAGE
 </b><br>
 <P>
 Patterns are compiled by PCRE into a reasonably efficient byte code, so that
 most simple patterns do not use much memory. However, there is one case where
-memory usage can be unexpectedly large. When a parenthesized subpattern has a
-quantifier with a minimum greater than 1 and/or a limited maximum, the whole
-subpattern is repeated in the compiled code. For example, the pattern
+the memory usage of a compiled pattern can be unexpectedly large. If a
+parenthesized subpattern has a quantifier with a minimum greater than 1 and/or
+a limited maximum, the whole subpattern is repeated in the compiled code. For
+example, the pattern
 <pre>
   (abc|def){2,4}
 </pre>
@@ -73,6 +74,18 @@
 that PCRE cannot otherwise handle.
 </P>
 <br><b>
+STACK USAGE AT RUN TIME
+</b><br>
+<P>
+When <b>pcre_exec()</b> is used for matching, certain kinds of pattern can cause
+it to use large amounts of the process stack. In some environments the default
+process stack is quite small, and if it runs out the result is often SIGSEGV.
+This issue is probably the most frequently raised problem with PCRE. Rewriting
+your pattern can often help. The
+<a href="pcrestack.html"><b>pcrestack</b></a>
+documentation discusses this issue in detail.
+</P>
+<br><b>
 PROCESSING TIME
 </b><br>
 <P>
@@ -164,9 +177,9 @@
 REVISION
 </b><br>
 <P>
-Last updated: 06 March 2007
+Last updated: 07 March 2010
 <br>
-Copyright &copy; 1997-2007 University of Cambridge.
+Copyright &copy; 1997-2010 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.


Modified: code/trunk/doc/html/pcresyntax.html
===================================================================
--- code/trunk/doc/html/pcresyntax.html    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/doc/html/pcresyntax.html    2010-03-10 16:08:01 UTC (rev 507)
@@ -146,7 +146,9 @@
 <P>
 Arabic,
 Armenian,
+Avestan,
 Balinese,
+Bamum,
 Bengali,
 Bopomofo,
 Braille,
@@ -163,6 +165,7 @@
 Cyrillic,
 Deseret,
 Devanagari,
+Egyptian_Hieroglyphs,
 Ethiopic,
 Georgian,
 Glagolitic,
@@ -175,7 +178,12 @@
 Hanunoo,
 Hebrew,
 Hiragana,
+Imperial_Aramaic,
 Inherited,
+Inscriptional_Pahlavi,
+Inscriptional_Parthian,
+Javanese,
+Kaithi,
 Kannada,
 Katakana,
 Kayah_Li,
@@ -186,9 +194,11 @@
 Lepcha,
 Limbu,
 Linear_B,
+Lisu,
 Lycian,
 Lydian,
 Malayalam,
+Meetei_Mayek,
 Mongolian,
 Myanmar,
 New_Tai_Lue,
@@ -196,6 +206,8 @@
 Ogham,
 Old_Italic,
 Old_Persian,
+Old_South_Arabian,
+Old_Turkic,
 Ol_Chiki,
 Oriya,
 Osmanya,
@@ -203,15 +215,18 @@
 Phoenician,
 Rejang,
 Runic,
+Samaritan,
 Saurashtra,
 Shavian,
 Sinhala,
-Sudanese,
+Sundanese,
 Syloti_Nagri,
 Syriac,
 Tagalog,
 Tagbanwa,
 Tai_Le,
+Tai_Tham,
+Tai_Viet,
 Tamil,
 Telugu,
 Thaana,
@@ -464,9 +479,9 @@
 </P>
 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 April 2009
+Last updated: 01 March 2010
 <br>
-Copyright &copy; 1997-2009 University of Cambridge.
+Copyright &copy; 1997-2010 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.


Modified: code/trunk/doc/pcre.txt
===================================================================
--- code/trunk/doc/pcre.txt    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/doc/pcre.txt    2010-03-10 16:08:01 UTC (rev 507)
@@ -29,7 +29,7 @@
        5.10, including support for UTF-8 encoded strings and  Unicode  general
        category  properties.  However,  UTF-8  and  Unicode  support has to be
        explicitly enabled; it is not the default. The  Unicode  tables  corre-
-       spond to Unicode release 5.1.
+       spond to Unicode release 5.2.0.


        In  addition to the Perl-compatible matching function, PCRE contains an
        alternative function that matches the same compiled patterns in a  dif-
@@ -263,8 +263,8 @@


REVISION

-       Last updated: 28 September 2009
-       Copyright (c) 1997-2009 University of Cambridge.
+       Last updated: 01 March 2010
+       Copyright (c) 1997-2010 University of Cambridge.
 ------------------------------------------------------------------------------



@@ -3488,24 +3488,29 @@
        Those that are not part of an identified script are lumped together  as
        "Common". The current list of scripts is:


-       Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
-       Buhid,  Canadian_Aboriginal,  Cherokee,  Common,   Coptic,   Cuneiform,
-       Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,
-       Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira-
-       gana,  Inherited,  Kannada,  Katakana,  Kharoshthi,  Khmer, Lao, Latin,
-       Limbu,  Linear_B,  Malayalam,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,
-       Ogham,  Old_Italic,  Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician,
-       Runic,  Shavian,  Sinhala,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,
-       Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.
+       Arabic, Armenian, Avestan, Balinese, Bamum, Bengali, Bopomofo, Braille,
+       Buginese, Buhid, Canadian_Aboriginal, Carian, Cham,  Cherokee,  Common,
+       Coptic,   Cuneiform,  Cypriot,  Cyrillic,  Deseret,  Devanagari,  Egyp-
+       tian_Hieroglyphs,  Ethiopic,  Georgian,  Glagolitic,   Gothic,   Greek,
+       Gujarati,  Gurmukhi,  Han,  Hangul,  Hanunoo,  Hebrew,  Hiragana, Impe-
+       rial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian,
+       Javanese,  Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao,
+       Latin,  Lepcha,  Limbu,  Linear_B,  Lisu,  Lycian,  Lydian,  Malayalam,
+       Meetei_Mayek,  Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Old_Italic,
+       Old_Persian, Old_South_Arabian, Old_Turkic, Ol_Chiki,  Oriya,  Osmanya,
+       Phags_Pa,  Phoenician,  Rejang,  Runic, Samaritan, Saurashtra, Shavian,
+       Sinhala, Sundanese, Syloti_Nagri, Syriac,  Tagalog,  Tagbanwa,  Tai_Le,
+       Tai_Tham,  Tai_Viet,  Tamil,  Telugu,  Thaana, Thai, Tibetan, Tifinagh,
+       Ugaritic, Vai, Yi.


-       Each  character has exactly one general category property, specified by
+       Each character has exactly one general category property, specified  by
        a two-letter abbreviation. For compatibility with Perl, negation can be
-       specified  by  including a circumflex between the opening brace and the
+       specified by including a circumflex between the opening brace  and  the
        property name. For example, \p{^Lu} is the same as \P{Lu}.


        If only one letter is specified with \p or \P, it includes all the gen-
-       eral  category properties that start with that letter. In this case, in
-       the absence of negation, the curly brackets in the escape sequence  are
+       eral category properties that start with that letter. In this case,  in
+       the  absence of negation, the curly brackets in the escape sequence are
        optional; these two examples have the same effect:


          \p{L}
@@ -3557,69 +3562,73 @@
          Zp    Paragraph separator
          Zs    Space separator


-       The  special property L& is also supported: it matches a character that
-       has the Lu, Ll, or Lt property, in other words, a letter  that  is  not
+       The special property L& is also supported: it matches a character  that
+       has  the  Lu,  Ll, or Lt property, in other words, a letter that is not
        classified as a modifier or "other".


-       The  Cs  (Surrogate)  property  applies only to characters in the range
-       U+D800 to U+DFFF. Such characters are not valid in UTF-8  strings  (see
+       The Cs (Surrogate) property applies only to  characters  in  the  range
+       U+D800  to  U+DFFF. Such characters are not valid in UTF-8 strings (see
        RFC 3629) and so cannot be tested by PCRE, unless UTF-8 validity check-
-       ing has been turned off (see the discussion  of  PCRE_NO_UTF8_CHECK  in
+       ing  has  been  turned off (see the discussion of PCRE_NO_UTF8_CHECK in
        the pcreapi page). Perl does not support the Cs property.


-       The  long  synonyms  for  property  names  that  Perl supports (such as
-       \p{Letter}) are not supported by PCRE, nor is it  permitted  to  prefix
+       The long synonyms for  property  names  that  Perl  supports  (such  as
+       \p{Letter})  are  not  supported by PCRE, nor is it permitted to prefix
        any of these properties with "Is".


        No character that is in the Unicode table has the Cn (unassigned) prop-
        erty.  Instead, this property is assumed for any code point that is not
        in the Unicode table.


-       Specifying  caseless  matching  does not affect these escape sequences.
+       Specifying caseless matching does not affect  these  escape  sequences.
        For example, \p{Lu} always matches only upper case letters.


-       The \X escape matches any number of Unicode  characters  that  form  an
+       The  \X  escape  matches  any number of Unicode characters that form an
        extended Unicode sequence. \X is equivalent to


          (?>\PM\pM*)


-       That  is,  it matches a character without the "mark" property, followed
-       by zero or more characters with the "mark"  property,  and  treats  the
-       sequence  as  an  atomic group (see below).  Characters with the "mark"
-       property are typically accents that  affect  the  preceding  character.
-       None  of  them  have  codepoints less than 256, so in non-UTF-8 mode \X
+       That is, it matches a character without the "mark"  property,  followed
+       by  zero  or  more  characters with the "mark" property, and treats the
+       sequence as an atomic group (see below).  Characters  with  the  "mark"
+       property  are  typically  accents  that affect the preceding character.
+       None of them have codepoints less than 256, so  in  non-UTF-8  mode  \X
        matches any one character.


-       Matching characters by Unicode property is not fast, because  PCRE  has
-       to  search  a  structure  that  contains data for over fifteen thousand
+       Matching  characters  by Unicode property is not fast, because PCRE has
+       to search a structure that contains  data  for  over  fifteen  thousand
        characters. That is why the traditional escape sequences such as \d and
        \w do not use Unicode properties in PCRE.


    Resetting the match start


        The escape sequence \K, which is a Perl 5.10 feature, causes any previ-
-       ously matched characters not  to  be  included  in  the  final  matched
+       ously  matched  characters  not  to  be  included  in the final matched
        sequence. For example, the pattern:


          foo\Kbar


-       matches  "foobar",  but reports that it has matched "bar". This feature
-       is similar to a lookbehind assertion (described  below).   However,  in
-       this  case, the part of the subject before the real match does not have
-       to be of fixed length, as lookbehind assertions do. The use of \K  does
-       not  interfere  with  the setting of captured substrings.  For example,
+       matches "foobar", but reports that it has matched "bar".  This  feature
+       is  similar  to  a lookbehind assertion (described below).  However, in
+       this case, the part of the subject before the real match does not  have
+       to  be of fixed length, as lookbehind assertions do. The use of \K does
+       not interfere with the setting of captured  substrings.   For  example,
        when the pattern


          (foo)\Kbar


        matches "foobar", the first substring is still set to "foo".


+       Perl  documents  that  the  use  of  \K  within assertions is "not well
+       defined". In PCRE, \K is acted upon  when  it  occurs  inside  positive
+       assertions, but is ignored in negative assertions.
+
    Simple assertions


-       The final use of backslash is for certain simple assertions. An  asser-
-       tion  specifies a condition that has to be met at a particular point in
-       a match, without consuming any characters from the subject string.  The
-       use  of subpatterns for more complicated assertions is described below.
+       The  final use of backslash is for certain simple assertions. An asser-
+       tion specifies a condition that has to be met at a particular point  in
+       a  match, without consuming any characters from the subject string. The
+       use of subpatterns for more complicated assertions is described  below.
        The backslashed assertions are:


          \b     matches at a word boundary
@@ -3630,44 +3639,44 @@
          \z     matches only at the end of the subject
          \G     matches at the first matching position in the subject


-       These assertions may not appear in character classes (but note that  \b
+       These  assertions may not appear in character classes (but note that \b
        has a different meaning, namely the backspace character, inside a char-
        acter class).


-       A word boundary is a position in the subject string where  the  current
-       character  and  the previous character do not both match \w or \W (i.e.
-       one matches \w and the other matches \W), or the start or  end  of  the
+       A  word  boundary is a position in the subject string where the current
+       character and the previous character do not both match \w or  \W  (i.e.
+       one  matches  \w  and the other matches \W), or the start or end of the
        string if the first or last character matches \w, respectively. Neither
-       PCRE nor Perl has a separte "start of word" or "end  of  word"  metase-
-       quence.  However,  whatever follows \b normally determines which it is.
+       PCRE  nor  Perl  has a separte "start of word" or "end of word" metase-
+       quence. However, whatever follows \b normally determines which  it  is.
        For example, the fragment \ba matches "a" at the start of a word.


-       The \A, \Z, and \z assertions differ from  the  traditional  circumflex
+       The  \A,  \Z,  and \z assertions differ from the traditional circumflex
        and dollar (described in the next section) in that they only ever match
-       at the very start and end of the subject string, whatever  options  are
-       set.  Thus,  they are independent of multiline mode. These three asser-
+       at  the  very start and end of the subject string, whatever options are
+       set. Thus, they are independent of multiline mode. These  three  asser-
        tions are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options, which
-       affect  only the behaviour of the circumflex and dollar metacharacters.
-       However, if the startoffset argument of pcre_exec() is non-zero,  indi-
+       affect only the behaviour of the circumflex and dollar  metacharacters.
+       However,  if the startoffset argument of pcre_exec() is non-zero, indi-
        cating that matching is to start at a point other than the beginning of
-       the subject, \A can never match. The difference between \Z  and  \z  is
+       the  subject,  \A  can never match. The difference between \Z and \z is
        that \Z matches before a newline at the end of the string as well as at
        the very end, whereas \z matches only at the end.


-       The \G assertion is true only when the current matching position is  at
-       the  start point of the match, as specified by the startoffset argument
-       of pcre_exec(). It differs from \A when the  value  of  startoffset  is
-       non-zero.  By calling pcre_exec() multiple times with appropriate argu-
+       The  \G assertion is true only when the current matching position is at
+       the start point of the match, as specified by the startoffset  argument
+       of  pcre_exec().  It  differs  from \A when the value of startoffset is
+       non-zero. By calling pcre_exec() multiple times with appropriate  argu-
        ments, you can mimic Perl's /g option, and it is in this kind of imple-
        mentation where \G can be useful.


-       Note,  however,  that  PCRE's interpretation of \G, as the start of the
+       Note, however, that PCRE's interpretation of \G, as the  start  of  the
        current match, is subtly different from Perl's, which defines it as the
-       end  of  the  previous  match. In Perl, these can be different when the
-       previously matched string was empty. Because PCRE does just  one  match
+       end of the previous match. In Perl, these can  be  different  when  the
+       previously  matched  string was empty. Because PCRE does just one match
        at a time, it cannot reproduce this behaviour.


-       If  all  the alternatives of a pattern begin with \G, the expression is
+       If all the alternatives of a pattern begin with \G, the  expression  is
        anchored to the starting match position, and the "anchored" flag is set
        in the compiled regular expression.


@@ -3675,90 +3684,90 @@
CIRCUMFLEX AND DOLLAR

        Outside a character class, in the default matching mode, the circumflex
-       character is an assertion that is true only  if  the  current  matching
-       point  is  at the start of the subject string. If the startoffset argu-
-       ment of pcre_exec() is non-zero, circumflex  can  never  match  if  the
-       PCRE_MULTILINE  option  is  unset. Inside a character class, circumflex
+       character  is  an  assertion  that is true only if the current matching
+       point is at the start of the subject string. If the  startoffset  argu-
+       ment  of  pcre_exec()  is  non-zero,  circumflex can never match if the
+       PCRE_MULTILINE option is unset. Inside a  character  class,  circumflex
        has an entirely different meaning (see below).


-       Circumflex need not be the first character of the pattern if  a  number
-       of  alternatives are involved, but it should be the first thing in each
-       alternative in which it appears if the pattern is ever  to  match  that
-       branch.  If all possible alternatives start with a circumflex, that is,
-       if the pattern is constrained to match only at the start  of  the  sub-
-       ject,  it  is  said  to be an "anchored" pattern. (There are also other
+       Circumflex  need  not be the first character of the pattern if a number
+       of alternatives are involved, but it should be the first thing in  each
+       alternative  in  which  it appears if the pattern is ever to match that
+       branch. If all possible alternatives start with a circumflex, that  is,
+       if  the  pattern  is constrained to match only at the start of the sub-
+       ject, it is said to be an "anchored" pattern.  (There  are  also  other
        constructs that can cause a pattern to be anchored.)


-       A dollar character is an assertion that is true  only  if  the  current
-       matching  point  is  at  the  end of the subject string, or immediately
+       A  dollar  character  is  an assertion that is true only if the current
+       matching point is at the end of  the  subject  string,  or  immediately
        before a newline at the end of the string (by default). Dollar need not
-       be  the  last  character of the pattern if a number of alternatives are
-       involved, but it should be the last item in  any  branch  in  which  it
+       be the last character of the pattern if a number  of  alternatives  are
+       involved,  but  it  should  be  the last item in any branch in which it
        appears. Dollar has no special meaning in a character class.


-       The  meaning  of  dollar  can be changed so that it matches only at the
-       very end of the string, by setting the  PCRE_DOLLAR_ENDONLY  option  at
+       The meaning of dollar can be changed so that it  matches  only  at  the
+       very  end  of  the string, by setting the PCRE_DOLLAR_ENDONLY option at
        compile time. This does not affect the \Z assertion.


        The meanings of the circumflex and dollar characters are changed if the
-       PCRE_MULTILINE option is set. When  this  is  the  case,  a  circumflex
-       matches  immediately after internal newlines as well as at the start of
-       the subject string. It does not match after a  newline  that  ends  the
-       string.  A dollar matches before any newlines in the string, as well as
-       at the very end, when PCRE_MULTILINE is set. When newline is  specified
-       as  the  two-character  sequence CRLF, isolated CR and LF characters do
+       PCRE_MULTILINE  option  is  set.  When  this  is the case, a circumflex
+       matches immediately after internal newlines as well as at the start  of
+       the  subject  string.  It  does not match after a newline that ends the
+       string. A dollar matches before any newlines in the string, as well  as
+       at  the very end, when PCRE_MULTILINE is set. When newline is specified
+       as the two-character sequence CRLF, isolated CR and  LF  characters  do
        not indicate newlines.


-       For example, the pattern /^abc$/ matches the subject string  "def\nabc"
-       (where  \n  represents a newline) in multiline mode, but not otherwise.
-       Consequently, patterns that are anchored in single  line  mode  because
-       all  branches  start  with  ^ are not anchored in multiline mode, and a
-       match for circumflex is  possible  when  the  startoffset  argument  of
-       pcre_exec()  is  non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if
+       For  example, the pattern /^abc$/ matches the subject string "def\nabc"
+       (where \n represents a newline) in multiline mode, but  not  otherwise.
+       Consequently,  patterns  that  are anchored in single line mode because
+       all branches start with ^ are not anchored in  multiline  mode,  and  a
+       match  for  circumflex  is  possible  when  the startoffset argument of
+       pcre_exec() is non-zero. The PCRE_DOLLAR_ENDONLY option is  ignored  if
        PCRE_MULTILINE is set.


-       Note that the sequences \A, \Z, and \z can be used to match  the  start
-       and  end of the subject in both modes, and if all branches of a pattern
-       start with \A it is always anchored, whether or not  PCRE_MULTILINE  is
+       Note  that  the sequences \A, \Z, and \z can be used to match the start
+       and end of the subject in both modes, and if all branches of a  pattern
+       start  with  \A it is always anchored, whether or not PCRE_MULTILINE is
        set.



FULL STOP (PERIOD, DOT)

        Outside a character class, a dot in the pattern matches any one charac-
-       ter in the subject string except (by default) a character  that  signi-
-       fies  the  end  of  a line. In UTF-8 mode, the matched character may be
+       ter  in  the subject string except (by default) a character that signi-
+       fies the end of a line. In UTF-8 mode, the  matched  character  may  be
        more than one byte long.


-       When a line ending is defined as a single character, dot never  matches
-       that  character; when the two-character sequence CRLF is used, dot does
-       not match CR if it is immediately followed  by  LF,  but  otherwise  it
-       matches  all characters (including isolated CRs and LFs). When any Uni-
-       code line endings are being recognized, dot does not match CR or LF  or
+       When  a line ending is defined as a single character, dot never matches
+       that character; when the two-character sequence CRLF is used, dot  does
+       not  match  CR  if  it  is immediately followed by LF, but otherwise it
+       matches all characters (including isolated CRs and LFs). When any  Uni-
+       code  line endings are being recognized, dot does not match CR or LF or
        any of the other line ending characters.


-       The  behaviour  of  dot  with regard to newlines can be changed. If the
-       PCRE_DOTALL option is set, a dot matches  any  one  character,  without
+       The behaviour of dot with regard to newlines can  be  changed.  If  the
+       PCRE_DOTALL  option  is  set,  a dot matches any one character, without
        exception. If the two-character sequence CRLF is present in the subject
        string, it takes two dots to match it.


-       The handling of dot is entirely independent of the handling of  circum-
-       flex  and  dollar,  the  only relationship being that they both involve
+       The  handling of dot is entirely independent of the handling of circum-
+       flex and dollar, the only relationship being  that  they  both  involve
        newlines. Dot has no special meaning in a character class.



MATCHING A SINGLE BYTE

        Outside a character class, the escape sequence \C matches any one byte,
-       both  in  and  out  of  UTF-8 mode. Unlike a dot, it always matches any
-       line-ending characters. The feature is provided in  Perl  in  order  to
-       match  individual bytes in UTF-8 mode. Because it breaks up UTF-8 char-
-       acters into individual bytes, what remains in the string may be a  mal-
-       formed  UTF-8  string.  For this reason, the \C escape sequence is best
+       both in and out of UTF-8 mode. Unlike a  dot,  it  always  matches  any
+       line-ending  characters.  The  feature  is provided in Perl in order to
+       match individual bytes in UTF-8 mode. Because it breaks up UTF-8  char-
+       acters  into individual bytes, what remains in the string may be a mal-
+       formed UTF-8 string. For this reason, the \C escape  sequence  is  best
        avoided.


-       PCRE does not allow \C to appear in  lookbehind  assertions  (described
-       below),  because  in UTF-8 mode this would make it impossible to calcu-
+       PCRE  does  not  allow \C to appear in lookbehind assertions (described
+       below), because in UTF-8 mode this would make it impossible  to  calcu-
        late the length of the lookbehind.



@@ -3768,97 +3777,97 @@
        closing square bracket. A closing square bracket on its own is not spe-
        cial by default.  However, if the PCRE_JAVASCRIPT_COMPAT option is set,
        a lone closing square bracket causes a compile-time error. If a closing
-       square bracket is required as a member of the class, it should  be  the
-       first  data  character  in  the  class (after an initial circumflex, if
+       square  bracket  is required as a member of the class, it should be the
+       first data character in the class  (after  an  initial  circumflex,  if
        present) or escaped with a backslash.


-       A character class matches a single character in the subject.  In  UTF-8
+       A  character  class matches a single character in the subject. In UTF-8
        mode, the character may be more than one byte long. A matched character
        must be in the set of characters defined by the class, unless the first
-       character  in  the  class definition is a circumflex, in which case the
-       subject character must not be in the set defined by  the  class.  If  a
-       circumflex  is actually required as a member of the class, ensure it is
+       character in the class definition is a circumflex, in  which  case  the
+       subject  character  must  not  be in the set defined by the class. If a
+       circumflex is actually required as a member of the class, ensure it  is
        not the first character, or escape it with a backslash.


-       For example, the character class [aeiou] matches any lower case  vowel,
-       while  [^aeiou]  matches  any character that is not a lower case vowel.
+       For  example, the character class [aeiou] matches any lower case vowel,
+       while [^aeiou] matches any character that is not a  lower  case  vowel.
        Note that a circumflex is just a convenient notation for specifying the
-       characters  that  are in the class by enumerating those that are not. A
-       class that starts with a circumflex is not an assertion; it still  con-
-       sumes  a  character  from the subject string, and therefore it fails if
+       characters that are in the class by enumerating those that are  not.  A
+       class  that starts with a circumflex is not an assertion; it still con-
+       sumes a character from the subject string, and therefore  it  fails  if
        the current pointer is at the end of the string.


-       In UTF-8 mode, characters with values greater than 255 can be  included
-       in  a  class as a literal string of bytes, or by using the \x{ escaping
+       In  UTF-8 mode, characters with values greater than 255 can be included
+       in a class as a literal string of bytes, or by using the  \x{  escaping
        mechanism.


-       When caseless matching is set, any letters in a  class  represent  both
-       their  upper  case  and lower case versions, so for example, a caseless
-       [aeiou] matches "A" as well as "a", and a caseless  [^aeiou]  does  not
-       match  "A", whereas a caseful version would. In UTF-8 mode, PCRE always
-       understands the concept of case for characters whose  values  are  less
-       than  128, so caseless matching is always possible. For characters with
-       higher values, the concept of case is supported  if  PCRE  is  compiled
-       with  Unicode  property support, but not otherwise.  If you want to use
-       caseless matching in UTF8-mode for characters 128 and above,  you  must
-       ensure  that  PCRE is compiled with Unicode property support as well as
+       When  caseless  matching  is set, any letters in a class represent both
+       their upper case and lower case versions, so for  example,  a  caseless
+       [aeiou]  matches  "A"  as well as "a", and a caseless [^aeiou] does not
+       match "A", whereas a caseful version would. In UTF-8 mode, PCRE  always
+       understands  the  concept  of case for characters whose values are less
+       than 128, so caseless matching is always possible. For characters  with
+       higher  values,  the  concept  of case is supported if PCRE is compiled
+       with Unicode property support, but not otherwise.  If you want  to  use
+       caseless  matching  in UTF8-mode for characters 128 and above, you must
+       ensure that PCRE is compiled with Unicode property support as  well  as
        with UTF-8 support.


-       Characters that might indicate line breaks are  never  treated  in  any
-       special  way  when  matching  character  classes,  whatever line-ending
-       sequence is in  use,  and  whatever  setting  of  the  PCRE_DOTALL  and
+       Characters  that  might  indicate  line breaks are never treated in any
+       special way  when  matching  character  classes,  whatever  line-ending
+       sequence  is  in  use,  and  whatever  setting  of  the PCRE_DOTALL and
        PCRE_MULTILINE options is used. A class such as [^a] always matches one
        of these characters.


-       The minus (hyphen) character can be used to specify a range of  charac-
-       ters  in  a  character  class.  For  example,  [d-m] matches any letter
-       between d and m, inclusive. If a  minus  character  is  required  in  a
-       class,  it  must  be  escaped  with a backslash or appear in a position
-       where it cannot be interpreted as indicating a range, typically as  the
+       The  minus (hyphen) character can be used to specify a range of charac-
+       ters in a character  class.  For  example,  [d-m]  matches  any  letter
+       between  d  and  m,  inclusive.  If  a minus character is required in a
+       class, it must be escaped with a backslash  or  appear  in  a  position
+       where  it cannot be interpreted as indicating a range, typically as the
        first or last character in the class.


        It is not possible to have the literal character "]" as the end charac-
-       ter of a range. A pattern such as [W-]46] is interpreted as a class  of
-       two  characters ("W" and "-") followed by a literal string "46]", so it
-       would match "W46]" or "-46]". However, if the "]"  is  escaped  with  a
-       backslash  it is interpreted as the end of range, so [W-\]46] is inter-
-       preted as a class containing a range followed by two other  characters.
-       The  octal or hexadecimal representation of "]" can also be used to end
+       ter  of a range. A pattern such as [W-]46] is interpreted as a class of
+       two characters ("W" and "-") followed by a literal string "46]", so  it
+       would  match  "W46]"  or  "-46]". However, if the "]" is escaped with a
+       backslash it is interpreted as the end of range, so [W-\]46] is  inter-
+       preted  as a class containing a range followed by two other characters.
+       The octal or hexadecimal representation of "]" can also be used to  end
        a range.


-       Ranges operate in the collating sequence of character values. They  can
-       also   be  used  for  characters  specified  numerically,  for  example
-       [\000-\037]. In UTF-8 mode, ranges can include characters whose  values
+       Ranges  operate in the collating sequence of character values. They can
+       also  be  used  for  characters  specified  numerically,  for   example
+       [\000-\037].  In UTF-8 mode, ranges can include characters whose values
        are greater than 255, for example [\x{100}-\x{2ff}].


        If a range that includes letters is used when caseless matching is set,
        it matches the letters in either case. For example, [W-c] is equivalent
-       to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if
-       character tables for a French locale are in  use,  [\xc8-\xcb]  matches
-       accented  E  characters in both cases. In UTF-8 mode, PCRE supports the
-       concept of case for characters with values greater than 128  only  when
+       to [][\\^_`wxyzabc], matched caselessly,  and  in  non-UTF-8  mode,  if
+       character  tables  for  a French locale are in use, [\xc8-\xcb] matches
+       accented E characters in both cases. In UTF-8 mode, PCRE  supports  the
+       concept  of  case for characters with values greater than 128 only when
        it is compiled with Unicode property support.


-       The  character types \d, \D, \p, \P, \s, \S, \w, and \W may also appear
-       in a character class, and add the characters that  they  match  to  the
+       The character types \d, \D, \p, \P, \s, \S, \w, and \W may also  appear
+       in  a  character  class,  and add the characters that they match to the
        class. For example, [\dABCDEF] matches any hexadecimal digit. A circum-
-       flex can conveniently be used with the upper case  character  types  to
-       specify  a  more  restricted  set of characters than the matching lower
-       case type. For example, the class [^\W_] matches any letter  or  digit,
+       flex  can  conveniently  be used with the upper case character types to
+       specify a more restricted set of characters  than  the  matching  lower
+       case  type.  For example, the class [^\W_] matches any letter or digit,
        but not underscore.


-       The  only  metacharacters  that are recognized in character classes are
-       backslash, hyphen (only where it can be  interpreted  as  specifying  a
-       range),  circumflex  (only  at the start), opening square bracket (only
-       when it can be interpreted as introducing a POSIX class name - see  the
-       next  section),  and  the  terminating closing square bracket. However,
+       The only metacharacters that are recognized in  character  classes  are
+       backslash,  hyphen  (only  where  it can be interpreted as specifying a
+       range), circumflex (only at the start), opening  square  bracket  (only
+       when  it can be interpreted as introducing a POSIX class name - see the
+       next section), and the terminating  closing  square  bracket.  However,
        escaping other non-alphanumeric characters does no harm.



POSIX CHARACTER CLASSES

        Perl supports the POSIX notation for character classes. This uses names
-       enclosed  by  [: and :] within the enclosing square brackets. PCRE also
+       enclosed by [: and :] within the enclosing square brackets.  PCRE  also
        supports this notation. For example,


          [01[:alpha:]%]
@@ -3881,18 +3890,18 @@
          word     "word" characters (same as \w)
          xdigit   hexadecimal digits


-       The  "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
-       and space (32). Notice that this list includes the VT  character  (code
+       The "space" characters are HT (9), LF (10), VT (11), FF (12), CR  (13),
+       and  space  (32). Notice that this list includes the VT character (code
        11). This makes "space" different to \s, which does not include VT (for
        Perl compatibility).


-       The name "word" is a Perl extension, and "blank"  is  a  GNU  extension
-       from  Perl  5.8. Another Perl extension is negation, which is indicated
+       The  name  "word"  is  a Perl extension, and "blank" is a GNU extension
+       from Perl 5.8. Another Perl extension is negation, which  is  indicated
        by a ^ character after the colon. For example,


          [12[:^digit:]]


-       matches "1", "2", or any non-digit. PCRE (and Perl) also recognize  the
+       matches  "1", "2", or any non-digit. PCRE (and Perl) also recognize the
        POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
        these are not supported, and an error is given if they are encountered.


@@ -3902,24 +3911,24 @@

VERTICAL BAR

-       Vertical  bar characters are used to separate alternative patterns. For
+       Vertical bar characters are used to separate alternative patterns.  For
        example, the pattern


          gilbert|sullivan


-       matches either "gilbert" or "sullivan". Any number of alternatives  may
-       appear,  and  an  empty  alternative  is  permitted (matching the empty
+       matches  either "gilbert" or "sullivan". Any number of alternatives may
+       appear, and an empty  alternative  is  permitted  (matching  the  empty
        string). The matching process tries each alternative in turn, from left
-       to  right, and the first one that succeeds is used. If the alternatives
-       are within a subpattern (defined below), "succeeds" means matching  the
+       to right, and the first one that succeeds is used. If the  alternatives
+       are  within a subpattern (defined below), "succeeds" means matching the
        rest of the main pattern as well as the alternative in the subpattern.



INTERNAL OPTION SETTING

-       The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
-       PCRE_EXTENDED options (which are Perl-compatible) can be  changed  from
-       within  the  pattern  by  a  sequence  of  Perl option letters enclosed
+       The settings of the  PCRE_CASELESS,  PCRE_MULTILINE,  PCRE_DOTALL,  and
+       PCRE_EXTENDED  options  (which are Perl-compatible) can be changed from
+       within the pattern by  a  sequence  of  Perl  option  letters  enclosed
        between "(?" and ")".  The option letters are


          i  for PCRE_CASELESS
@@ -3929,46 +3938,46 @@


        For example, (?im) sets caseless, multiline matching. It is also possi-
        ble to unset these options by preceding the letter with a hyphen, and a
-       combined setting and unsetting such as (?im-sx), which sets  PCRE_CASE-
-       LESS  and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
-       is also permitted. If a  letter  appears  both  before  and  after  the
+       combined  setting and unsetting such as (?im-sx), which sets PCRE_CASE-
+       LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and  PCRE_EXTENDED,
+       is  also  permitted.  If  a  letter  appears  both before and after the
        hyphen, the option is unset.


-       The  PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA
-       can be changed in the same way as the Perl-compatible options by  using
+       The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and  PCRE_EXTRA
+       can  be changed in the same way as the Perl-compatible options by using
        the characters J, U and X respectively.


-       When  one  of  these  option  changes occurs at top level (that is, not
-       inside subpattern parentheses), the change applies to the remainder  of
+       When one of these option changes occurs at  top  level  (that  is,  not
+       inside  subpattern parentheses), the change applies to the remainder of
        the pattern that follows. If the change is placed right at the start of
        a pattern, PCRE extracts it into the global options (and it will there-
        fore show up in data extracted by the pcre_fullinfo() function).


-       An  option  change  within a subpattern (see below for a description of
+       An option change within a subpattern (see below for  a  description  of
        subpatterns) affects only that part of the current pattern that follows
        it, so


          (a(?i)b)c


        matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
-       used).  By this means, options can be made to have  different  settings
-       in  different parts of the pattern. Any changes made in one alternative
-       do carry on into subsequent branches within the  same  subpattern.  For
+       used).   By  this means, options can be made to have different settings
+       in different parts of the pattern. Any changes made in one  alternative
+       do  carry  on  into subsequent branches within the same subpattern. For
        example,


          (a(?i)b|c)


-       matches  "ab",  "aB",  "c",  and "C", even though when matching "C" the
-       first branch is abandoned before the option setting.  This  is  because
-       the  effects  of option settings happen at compile time. There would be
+       matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the
+       first  branch  is  abandoned before the option setting. This is because
+       the effects of option settings happen at compile time. There  would  be
        some very weird behaviour otherwise.


-       Note: There are other PCRE-specific options that  can  be  set  by  the
-       application  when  the  compile  or match functions are called. In some
+       Note:  There  are  other  PCRE-specific  options that can be set by the
+       application when the compile or match functions  are  called.  In  some
        cases the pattern can contain special leading sequences such as (*CRLF)
-       to  override  what  the application has set or what has been defaulted.
-       Details are given in the section entitled  "Newline  sequences"  above.
-       There  is  also  the  (*UTF8)  leading sequence that can be used to set
+       to override what the application has set or what  has  been  defaulted.
+       Details  are  given  in the section entitled "Newline sequences" above.
+       There is also the (*UTF8) leading sequence that  can  be  used  to  set
        UTF-8 mode; this is equivalent to setting the PCRE_UTF8 option.



@@ -3981,18 +3990,18 @@

          cat(aract|erpillar|)


-       matches  one  of the words "cat", "cataract", or "caterpillar". Without
-       the parentheses, it would match  "cataract",  "erpillar"  or  an  empty
+       matches one of the words "cat", "cataract", or  "caterpillar".  Without
+       the  parentheses,  it  would  match  "cataract", "erpillar" or an empty
        string.


-       2.  It  sets  up  the  subpattern as a capturing subpattern. This means
-       that, when the whole pattern  matches,  that  portion  of  the  subject
+       2. It sets up the subpattern as  a  capturing  subpattern.  This  means
+       that,  when  the  whole  pattern  matches,  that portion of the subject
        string that matched the subpattern is passed back to the caller via the
-       ovector argument of pcre_exec(). Opening parentheses are  counted  from
-       left  to  right  (starting  from 1) to obtain numbers for the capturing
+       ovector  argument  of pcre_exec(). Opening parentheses are counted from
+       left to right (starting from 1) to obtain  numbers  for  the  capturing
        subpatterns.


-       For example, if the string "the red king" is matched against  the  pat-
+       For  example,  if the string "the red king" is matched against the pat-
        tern


          the ((red|white) (king|queen))
@@ -4000,12 +4009,12 @@
        the captured substrings are "red king", "red", and "king", and are num-
        bered 1, 2, and 3, respectively.


-       The fact that plain parentheses fulfil  two  functions  is  not  always
-       helpful.   There are often times when a grouping subpattern is required
-       without a capturing requirement. If an opening parenthesis is  followed
-       by  a question mark and a colon, the subpattern does not do any captur-
-       ing, and is not counted when computing the  number  of  any  subsequent
-       capturing  subpatterns. For example, if the string "the white queen" is
+       The  fact  that  plain  parentheses  fulfil two functions is not always
+       helpful.  There are often times when a grouping subpattern is  required
+       without  a capturing requirement. If an opening parenthesis is followed
+       by a question mark and a colon, the subpattern does not do any  captur-
+       ing,  and  is  not  counted when computing the number of any subsequent
+       capturing subpatterns. For example, if the string "the white queen"  is
        matched against the pattern


          the ((?:red|white) (king|queen))
@@ -4013,96 +4022,96 @@
        the captured substrings are "white queen" and "queen", and are numbered
        1 and 2. The maximum number of capturing subpatterns is 65535.


-       As  a  convenient shorthand, if any option settings are required at the
-       start of a non-capturing subpattern,  the  option  letters  may  appear
+       As a convenient shorthand, if any option settings are required  at  the
+       start  of  a  non-capturing  subpattern,  the option letters may appear
        between the "?" and the ":". Thus the two patterns


          (?i:saturday|sunday)
          (?:(?i)saturday|sunday)


        match exactly the same set of strings. Because alternative branches are
-       tried from left to right, and options are not reset until  the  end  of
-       the  subpattern is reached, an option setting in one branch does affect
-       subsequent branches, so the above patterns match "SUNDAY"  as  well  as
+       tried  from  left  to right, and options are not reset until the end of
+       the subpattern is reached, an option setting in one branch does  affect
+       subsequent  branches,  so  the above patterns match "SUNDAY" as well as
        "Saturday".



DUPLICATE SUBPATTERN NUMBERS

        Perl 5.10 introduced a feature whereby each alternative in a subpattern
-       uses the same numbers for its capturing parentheses. Such a  subpattern
-       starts  with (?| and is itself a non-capturing subpattern. For example,
+       uses  the same numbers for its capturing parentheses. Such a subpattern
+       starts with (?| and is itself a non-capturing subpattern. For  example,
        consider this pattern:


          (?|(Sat)ur|(Sun))day


-       Because the two alternatives are inside a (?| group, both sets of  cap-
-       turing  parentheses  are  numbered one. Thus, when the pattern matches,
-       you can look at captured substring number  one,  whichever  alternative
-       matched.  This  construct  is useful when you want to capture part, but
+       Because  the two alternatives are inside a (?| group, both sets of cap-
+       turing parentheses are numbered one. Thus, when  the  pattern  matches,
+       you  can  look  at captured substring number one, whichever alternative
+       matched. This construct is useful when you want to  capture  part,  but
        not all, of one of a number of alternatives. Inside a (?| group, paren-
-       theses  are  numbered as usual, but the number is reset at the start of
-       each branch. The numbers of any capturing buffers that follow the  sub-
-       pattern  start after the highest number used in any branch. The follow-
-       ing example is taken from the Perl documentation.  The  numbers  under-
+       theses are numbered as usual, but the number is reset at the  start  of
+       each  branch. The numbers of any capturing buffers that follow the sub-
+       pattern start after the highest number used in any branch. The  follow-
+       ing  example  is taken from the Perl documentation.  The numbers under-
        neath show in which buffer the captured content will be stored.


          # before  ---------------branch-reset----------- after
          / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
          # 1            2         2  3        2     3     4


-       A  back  reference  to a numbered subpattern uses the most recent value
-       that is set for that number by any subpattern.  The  following  pattern
+       A back reference to a numbered subpattern uses the  most  recent  value
+       that  is  set  for that number by any subpattern. The following pattern
        matches "abcabc" or "defdef":


          /(?|(abc)|(def))\1/


-       In  contrast, a recursive or "subroutine" call to a numbered subpattern
-       always refers to the first one in the pattern with  the  given  number.
+       In contrast, a recursive or "subroutine" call to a numbered  subpattern
+       always  refers  to  the first one in the pattern with the given number.
        The following pattern matches "abcabc" or "defabc":


          /(?|(abc)|(def))(?1)/


-       If  a condition test for a subpattern's having matched refers to a non-
-       unique number, the test is true if any of the subpatterns of that  num-
+       If a condition test for a subpattern's having matched refers to a  non-
+       unique  number, the test is true if any of the subpatterns of that num-
        ber have matched.


-       An  alternative approach to using this "branch reset" feature is to use
+       An alternative approach to using this "branch reset" feature is to  use
        duplicate named subpatterns, as described in the next section.



NAMED SUBPATTERNS

-       Identifying capturing parentheses by number is simple, but  it  can  be
-       very  hard  to keep track of the numbers in complicated regular expres-
-       sions. Furthermore, if an  expression  is  modified,  the  numbers  may
-       change.  To help with this difficulty, PCRE supports the naming of sub-
+       Identifying  capturing  parentheses  by number is simple, but it can be
+       very hard to keep track of the numbers in complicated  regular  expres-
+       sions.  Furthermore,  if  an  expression  is  modified, the numbers may
+       change. To help with this difficulty, PCRE supports the naming of  sub-
        patterns. This feature was not added to Perl until release 5.10. Python
-       had  the  feature earlier, and PCRE introduced it at release 4.0, using
-       the Python syntax. PCRE now supports both the Perl and the Python  syn-
-       tax.  Perl  allows  identically  numbered subpatterns to have different
+       had the feature earlier, and PCRE introduced it at release  4.0,  using
+       the  Python syntax. PCRE now supports both the Perl and the Python syn-
+       tax. Perl allows identically numbered  subpatterns  to  have  different
        names, but PCRE does not.


-       In PCRE, a subpattern can be named in one of three  ways:  (?<name>...)
-       or  (?'name'...)  as in Perl, or (?P<name>...) as in Python. References
-       to capturing parentheses from other parts of the pattern, such as  back
-       references,  recursion,  and conditions, can be made by name as well as
+       In  PCRE,  a subpattern can be named in one of three ways: (?<name>...)
+       or (?'name'...) as in Perl, or (?P<name>...) as in  Python.  References
+       to  capturing parentheses from other parts of the pattern, such as back
+       references, recursion, and conditions, can be made by name as  well  as
        by number.


-       Names consist of up to  32  alphanumeric  characters  and  underscores.
-       Named  capturing  parentheses  are  still  allocated numbers as well as
-       names, exactly as if the names were not present. The PCRE API  provides
+       Names  consist  of  up  to  32 alphanumeric characters and underscores.
+       Named capturing parentheses are still  allocated  numbers  as  well  as
+       names,  exactly as if the names were not present. The PCRE API provides
        function calls for extracting the name-to-number translation table from
        a compiled pattern. There is also a convenience function for extracting
        a captured substring by name.


-       By  default, a name must be unique within a pattern, but it is possible
+       By default, a name must be unique within a pattern, but it is  possible
        to relax this constraint by setting the PCRE_DUPNAMES option at compile
-       time.  (Duplicate  names are also always permitted for subpatterns with
-       the same number, set up as described in the previous  section.)  Dupli-
-       cate  names  can  be useful for patterns where only one instance of the
-       named parentheses can match. Suppose you want to match the  name  of  a
-       weekday,  either as a 3-letter abbreviation or as the full name, and in
+       time. (Duplicate names are also always permitted for  subpatterns  with
+       the  same  number, set up as described in the previous section.) Dupli-
+       cate names can be useful for patterns where only one  instance  of  the
+       named  parentheses  can  match. Suppose you want to match the name of a
+       weekday, either as a 3-letter abbreviation or as the full name, and  in
        both cases you want to extract the abbreviation. This pattern (ignoring
        the line breaks) does the job:


@@ -4112,38 +4121,38 @@
          (?<DN>Thu)(?:rsday)?|
          (?<DN>Sat)(?:urday)?


-       There  are  five capturing substrings, but only one is ever set after a
+       There are five capturing substrings, but only one is ever set  after  a
        match.  (An alternative way of solving this problem is to use a "branch
        reset" subpattern, as described in the previous section.)


-       The  convenience  function  for extracting the data by name returns the
-       substring for the first (and in this example, the only)  subpattern  of
-       that  name  that  matched.  This saves searching to find which numbered
+       The convenience function for extracting the data by  name  returns  the
+       substring  for  the first (and in this example, the only) subpattern of
+       that name that matched. This saves searching  to  find  which  numbered
        subpattern it was.


-       If you make a back reference to  a  non-unique  named  subpattern  from
-       elsewhere  in the pattern, the one that corresponds to the first occur-
+       If  you  make  a  back  reference to a non-unique named subpattern from
+       elsewhere in the pattern, the one that corresponds to the first  occur-
        rence of the name is used. In the absence of duplicate numbers (see the
-       previous  section) this is the one with the lowest number. If you use a
-       named reference in a condition test (see the section  about  conditions
-       below),  either  to check whether a subpattern has matched, or to check
-       for recursion, all subpatterns with the same name are  tested.  If  the
-       condition  is  true for any one of them, the overall condition is true.
+       previous section) this is the one with the lowest number. If you use  a
+       named  reference  in a condition test (see the section about conditions
+       below), either to check whether a subpattern has matched, or  to  check
+       for  recursion,  all  subpatterns with the same name are tested. If the
+       condition is true for any one of them, the overall condition  is  true.
        This is the same behaviour as testing by number. For further details of
        the interfaces for handling named subpatterns, see the pcreapi documen-
        tation.


        Warning: You cannot use different names to distinguish between two sub-
-       patterns  with  the same number because PCRE uses only the numbers when
+       patterns with the same number because PCRE uses only the  numbers  when
        matching. For this reason, an error is given at compile time if differ-
-       ent  names  are given to subpatterns with the same number. However, you
-       can give the same name to subpatterns with the same number,  even  when
+       ent names are given to subpatterns with the same number.  However,  you
+       can  give  the same name to subpatterns with the same number, even when
        PCRE_DUPNAMES is not set.



REPETITION

-       Repetition  is  specified  by  quantifiers, which can follow any of the
+       Repetition is specified by quantifiers, which can  follow  any  of  the
        following items:


          a literal data character
@@ -4157,17 +4166,17 @@
          a parenthesized subpattern (unless it is an assertion)
          a recursive or "subroutine" call to a subpattern


-       The general repetition quantifier specifies a minimum and maximum  num-
-       ber  of  permitted matches, by giving the two numbers in curly brackets
-       (braces), separated by a comma. The numbers must be  less  than  65536,
+       The  general repetition quantifier specifies a minimum and maximum num-
+       ber of permitted matches, by giving the two numbers in  curly  brackets
+       (braces),  separated  by  a comma. The numbers must be less than 65536,
        and the first must be less than or equal to the second. For example:


          z{2,4}


-       matches  "zz",  "zzz",  or  "zzzz". A closing brace on its own is not a
-       special character. If the second number is omitted, but  the  comma  is
-       present,  there  is  no upper limit; if the second number and the comma
-       are both omitted, the quantifier specifies an exact number of  required
+       matches "zz", "zzz", or "zzzz". A closing brace on its  own  is  not  a
+       special  character.  If  the second number is omitted, but the comma is
+       present, there is no upper limit; if the second number  and  the  comma
+       are  both omitted, the quantifier specifies an exact number of required
        matches. Thus


          [aeiou]{3,}
@@ -4176,49 +4185,49 @@


          \d{8}


-       matches  exactly  8  digits. An opening curly bracket that appears in a
-       position where a quantifier is not allowed, or one that does not  match
-       the  syntax of a quantifier, is taken as a literal character. For exam-
+       matches exactly 8 digits. An opening curly bracket that  appears  in  a
+       position  where a quantifier is not allowed, or one that does not match
+       the syntax of a quantifier, is taken as a literal character. For  exam-
        ple, {,6} is not a quantifier, but a literal string of four characters.


-       In UTF-8 mode, quantifiers apply to UTF-8  characters  rather  than  to
+       In  UTF-8  mode,  quantifiers  apply to UTF-8 characters rather than to
        individual bytes. Thus, for example, \x{100}{2} matches two UTF-8 char-
        acters, each of which is represented by a two-byte sequence. Similarly,
        when Unicode property support is available, \X{3} matches three Unicode
-       extended sequences, each of which may be several bytes long  (and  they
+       extended  sequences,  each of which may be several bytes long (and they
        may be of different lengths).


        The quantifier {0} is permitted, causing the expression to behave as if
        the previous item and the quantifier were not present. This may be use-
-       ful  for  subpatterns that are referenced as subroutines from elsewhere
+       ful for subpatterns that are referenced as subroutines  from  elsewhere
        in the pattern. Items other than subpatterns that have a {0} quantifier
        are omitted from the compiled pattern.


-       For  convenience, the three most common quantifiers have single-charac-
+       For convenience, the three most common quantifiers have  single-charac-
        ter abbreviations:


          *    is equivalent to {0,}
          +    is equivalent to {1,}
          ?    is equivalent to {0,1}


-       It is possible to construct infinite loops by  following  a  subpattern
+       It  is  possible  to construct infinite loops by following a subpattern
        that can match no characters with a quantifier that has no upper limit,
        for example:


          (a?)*


        Earlier versions of Perl and PCRE used to give an error at compile time
-       for  such  patterns. However, because there are cases where this can be
-       useful, such patterns are now accepted, but if any  repetition  of  the
-       subpattern  does in fact match no characters, the loop is forcibly bro-
+       for such patterns. However, because there are cases where this  can  be
+       useful,  such  patterns  are now accepted, but if any repetition of the
+       subpattern does in fact match no characters, the loop is forcibly  bro-
        ken.


-       By default, the quantifiers are "greedy", that is, they match  as  much
-       as  possible  (up  to  the  maximum number of permitted times), without
-       causing the rest of the pattern to fail. The classic example  of  where
+       By  default,  the quantifiers are "greedy", that is, they match as much
+       as possible (up to the maximum  number  of  permitted  times),  without
+       causing  the  rest of the pattern to fail. The classic example of where
        this gives problems is in trying to match comments in C programs. These
-       appear between /* and */ and within the comment,  individual  *  and  /
-       characters  may  appear. An attempt to match C comments by applying the
+       appear  between  /*  and  */ and within the comment, individual * and /
+       characters may appear. An attempt to match C comments by  applying  the
        pattern


          /\*.*\*/
@@ -4227,19 +4236,19 @@


          /* first comment */  not comment  /* second comment */


-       fails, because it matches the entire string owing to the greediness  of
+       fails,  because it matches the entire string owing to the greediness of
        the .*  item.


-       However,  if  a quantifier is followed by a question mark, it ceases to
+       However, if a quantifier is followed by a question mark, it  ceases  to
        be greedy, and instead matches the minimum number of times possible, so
        the pattern


          /\*.*?\*/


-       does  the  right  thing with the C comments. The meaning of the various
-       quantifiers is not otherwise changed,  just  the  preferred  number  of
-       matches.   Do  not  confuse this use of question mark with its use as a
-       quantifier in its own right. Because it has two uses, it can  sometimes
+       does the right thing with the C comments. The meaning  of  the  various
+       quantifiers  is  not  otherwise  changed,  just the preferred number of
+       matches.  Do not confuse this use of question mark with its  use  as  a
+       quantifier  in its own right. Because it has two uses, it can sometimes
        appear doubled, as in


          \d??\d
@@ -4247,36 +4256,36 @@
        which matches one digit by preference, but can match two if that is the
        only way the rest of the pattern matches.


-       If the PCRE_UNGREEDY option is set (an option that is not available  in
-       Perl),  the  quantifiers are not greedy by default, but individual ones
-       can be made greedy by following them with a  question  mark.  In  other
+       If  the PCRE_UNGREEDY option is set (an option that is not available in
+       Perl), the quantifiers are not greedy by default, but  individual  ones
+       can  be  made  greedy  by following them with a question mark. In other
        words, it inverts the default behaviour.


-       When  a  parenthesized  subpattern  is quantified with a minimum repeat
-       count that is greater than 1 or with a limited maximum, more memory  is
-       required  for  the  compiled  pattern, in proportion to the size of the
+       When a parenthesized subpattern is quantified  with  a  minimum  repeat
+       count  that is greater than 1 or with a limited maximum, more memory is
+       required for the compiled pattern, in proportion to  the  size  of  the
        minimum or maximum.


        If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv-
-       alent  to  Perl's  /s) is set, thus allowing the dot to match newlines,
-       the pattern is implicitly anchored, because whatever  follows  will  be
-       tried  against every character position in the subject string, so there
-       is no point in retrying the overall match at  any  position  after  the
-       first.  PCRE  normally treats such a pattern as though it were preceded
+       alent to Perl's /s) is set, thus allowing the dot  to  match  newlines,
+       the  pattern  is  implicitly anchored, because whatever follows will be
+       tried against every character position in the subject string, so  there
+       is  no  point  in  retrying the overall match at any position after the
+       first. PCRE normally treats such a pattern as though it  were  preceded
        by \A.


-       In cases where it is known that the subject  string  contains  no  new-
-       lines,  it  is  worth setting PCRE_DOTALL in order to obtain this opti-
+       In  cases  where  it  is known that the subject string contains no new-
+       lines, it is worth setting PCRE_DOTALL in order to  obtain  this  opti-
        mization, or alternatively using ^ to indicate anchoring explicitly.


-       However, there is one situation where the optimization cannot be  used.
+       However,  there is one situation where the optimization cannot be used.
        When .*  is inside capturing parentheses that are the subject of a back
        reference elsewhere in the pattern, a match at the start may fail where
        a later one succeeds. Consider, for example:


          (.*)abc\1


-       If  the subject is "xyz123abc123" the match point is the fourth charac-
+       If the subject is "xyz123abc123" the match point is the fourth  charac-
        ter. For this reason, such a pattern is not implicitly anchored.


        When a capturing subpattern is repeated, the value captured is the sub-
@@ -4285,8 +4294,8 @@
          (tweedle[dume]{3}\s*)+


        has matched "tweedledum tweedledee" the value of the captured substring
-       is "tweedledee". However, if there are  nested  capturing  subpatterns,
-       the  corresponding captured values may have been set in previous itera-
+       is  "tweedledee".  However,  if there are nested capturing subpatterns,
+       the corresponding captured values may have been set in previous  itera-
        tions. For example, after


          /(a|(b))+/
@@ -4296,53 +4305,53 @@


ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS

-       With both maximizing ("greedy") and minimizing ("ungreedy"  or  "lazy")
-       repetition,  failure  of what follows normally causes the repeated item
-       to be re-evaluated to see if a different number of repeats  allows  the
-       rest  of  the pattern to match. Sometimes it is useful to prevent this,
-       either to change the nature of the match, or to cause it  fail  earlier
-       than  it otherwise might, when the author of the pattern knows there is
+       With  both  maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
+       repetition, failure of what follows normally causes the  repeated  item
+       to  be  re-evaluated to see if a different number of repeats allows the
+       rest of the pattern to match. Sometimes it is useful to  prevent  this,
+       either  to  change the nature of the match, or to cause it fail earlier
+       than it otherwise might, when the author of the pattern knows there  is
        no point in carrying on.


-       Consider, for example, the pattern \d+foo when applied to  the  subject
+       Consider,  for  example, the pattern \d+foo when applied to the subject
        line


          123456bar


        After matching all 6 digits and then failing to match "foo", the normal
-       action of the matcher is to try again with only 5 digits  matching  the
-       \d+  item,  and  then  with  4,  and  so on, before ultimately failing.
-       "Atomic grouping" (a term taken from Jeffrey  Friedl's  book)  provides
-       the  means for specifying that once a subpattern has matched, it is not
+       action  of  the matcher is to try again with only 5 digits matching the
+       \d+ item, and then with  4,  and  so  on,  before  ultimately  failing.
+       "Atomic  grouping"  (a  term taken from Jeffrey Friedl's book) provides
+       the means for specifying that once a subpattern has matched, it is  not
        to be re-evaluated in this way.


-       If we use atomic grouping for the previous example, the  matcher  gives
-       up  immediately  on failing to match "foo" the first time. The notation
+       If  we  use atomic grouping for the previous example, the matcher gives
+       up immediately on failing to match "foo" the first time.  The  notation
        is a kind of special parenthesis, starting with (?> as in this example:


          (?>\d+)foo


-       This kind of parenthesis "locks up" the  part of the  pattern  it  con-
-       tains  once  it  has matched, and a failure further into the pattern is
-       prevented from backtracking into it. Backtracking past it  to  previous
+       This  kind  of  parenthesis "locks up" the  part of the pattern it con-
+       tains once it has matched, and a failure further into  the  pattern  is
+       prevented  from  backtracking into it. Backtracking past it to previous
        items, however, works as normal.


-       An  alternative  description  is that a subpattern of this type matches
-       the string of characters that an  identical  standalone  pattern  would
+       An alternative description is that a subpattern of  this  type  matches
+       the  string  of  characters  that an identical standalone pattern would
        match, if anchored at the current point in the subject string.


        Atomic grouping subpatterns are not capturing subpatterns. Simple cases
        such as the above example can be thought of as a maximizing repeat that
-       must  swallow  everything  it can. So, while both \d+ and \d+? are pre-
-       pared to adjust the number of digits they match in order  to  make  the
+       must swallow everything it can. So, while both \d+ and  \d+?  are  pre-
+       pared  to  adjust  the number of digits they match in order to make the
        rest of the pattern match, (?>\d+) can only match an entire sequence of
        digits.


-       Atomic groups in general can of course contain arbitrarily  complicated
-       subpatterns,  and  can  be  nested. However, when the subpattern for an
+       Atomic  groups in general can of course contain arbitrarily complicated
+       subpatterns, and can be nested. However, when  the  subpattern  for  an
        atomic group is just a single repeated item, as in the example above, a
-       simpler  notation,  called  a "possessive quantifier" can be used. This
-       consists of an additional + character  following  a  quantifier.  Using
+       simpler notation, called a "possessive quantifier" can  be  used.  This
+       consists  of  an  additional  + character following a quantifier. Using
        this notation, the previous example can be rewritten as


          \d++foo
@@ -4352,45 +4361,45 @@


          (abc|xyz){2,3}+


-       Possessive  quantifiers  are  always  greedy;  the   setting   of   the
+       Possessive   quantifiers   are   always  greedy;  the  setting  of  the
        PCRE_UNGREEDY option is ignored. They are a convenient notation for the
-       simpler forms of atomic group. However, there is no difference  in  the
-       meaning  of  a  possessive  quantifier and the equivalent atomic group,
-       though there may be a performance  difference;  possessive  quantifiers
+       simpler  forms  of atomic group. However, there is no difference in the
+       meaning of a possessive quantifier and  the  equivalent  atomic  group,
+       though  there  may  be a performance difference; possessive quantifiers
        should be slightly faster.


-       The  possessive  quantifier syntax is an extension to the Perl 5.8 syn-
-       tax.  Jeffrey Friedl originated the idea (and the name)  in  the  first
+       The possessive quantifier syntax is an extension to the Perl  5.8  syn-
+       tax.   Jeffrey  Friedl  originated the idea (and the name) in the first
        edition of his book. Mike McCloskey liked it, so implemented it when he
-       built Sun's Java package, and PCRE copied it from there. It  ultimately
+       built  Sun's Java package, and PCRE copied it from there. It ultimately
        found its way into Perl at release 5.10.


        PCRE has an optimization that automatically "possessifies" certain sim-
-       ple pattern constructs. For example, the sequence  A+B  is  treated  as
-       A++B  because  there is no point in backtracking into a sequence of A's
+       ple  pattern  constructs.  For  example, the sequence A+B is treated as
+       A++B because there is no point in backtracking into a sequence  of  A's
        when B must follow.


-       When a pattern contains an unlimited repeat inside  a  subpattern  that
-       can  itself  be  repeated  an  unlimited number of times, the use of an
-       atomic group is the only way to avoid some  failing  matches  taking  a
+       When  a  pattern  contains an unlimited repeat inside a subpattern that
+       can itself be repeated an unlimited number of  times,  the  use  of  an
+       atomic  group  is  the  only way to avoid some failing matches taking a
        very long time indeed. The pattern


          (\D+|<\d+>)*[!?]


-       matches  an  unlimited number of substrings that either consist of non-
-       digits, or digits enclosed in <>, followed by either ! or  ?.  When  it
+       matches an unlimited number of substrings that either consist  of  non-
+       digits,  or  digits  enclosed in <>, followed by either ! or ?. When it
        matches, it runs quickly. However, if it is applied to


          aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa


-       it  takes  a  long  time  before reporting failure. This is because the
-       string can be divided between the internal \D+ repeat and the  external
-       *  repeat  in  a  large  number of ways, and all have to be tried. (The
-       example uses [!?] rather than a single character at  the  end,  because
-       both  PCRE  and  Perl have an optimization that allows for fast failure
-       when a single character is used. They remember the last single  charac-
-       ter  that  is required for a match, and fail early if it is not present
-       in the string.) If the pattern is changed so that  it  uses  an  atomic
+       it takes a long time before reporting  failure.  This  is  because  the
+       string  can be divided between the internal \D+ repeat and the external
+       * repeat in a large number of ways, and all  have  to  be  tried.  (The
+       example  uses  [!?]  rather than a single character at the end, because
+       both PCRE and Perl have an optimization that allows  for  fast  failure
+       when  a single character is used. They remember the last single charac-
+       ter that is required for a match, and fail early if it is  not  present
+       in  the  string.)  If  the pattern is changed so that it uses an atomic
        group, like this:


          ((?>\D+)|<\d+>)*[!?]
@@ -4402,37 +4411,37 @@


        Outside a character class, a backslash followed by a digit greater than
        0 (and possibly further digits) is a back reference to a capturing sub-
-       pattern  earlier  (that is, to its left) in the pattern, provided there
+       pattern earlier (that is, to its left) in the pattern,  provided  there
        have been that many previous capturing left parentheses.


        However, if the decimal number following the backslash is less than 10,
-       it  is  always  taken  as a back reference, and causes an error only if
-       there are not that many capturing left parentheses in the  entire  pat-
-       tern.  In  other words, the parentheses that are referenced need not be
-       to the left of the reference for numbers less than 10. A "forward  back
-       reference"  of  this  type can make sense when a repetition is involved
-       and the subpattern to the right has participated in an  earlier  itera-
+       it is always taken as a back reference, and causes  an  error  only  if
+       there  are  not that many capturing left parentheses in the entire pat-
+       tern. In other words, the parentheses that are referenced need  not  be
+       to  the left of the reference for numbers less than 10. A "forward back
+       reference" of this type can make sense when a  repetition  is  involved
+       and  the  subpattern to the right has participated in an earlier itera-
        tion.


-       It  is  not  possible to have a numerical "forward back reference" to a
-       subpattern whose number is 10 or  more  using  this  syntax  because  a
-       sequence  such  as  \50 is interpreted as a character defined in octal.
+       It is not possible to have a numerical "forward back  reference"  to  a
+       subpattern  whose  number  is  10  or  more using this syntax because a
+       sequence such as \50 is interpreted as a character  defined  in  octal.
        See the subsection entitled "Non-printing characters" above for further
-       details  of  the  handling of digits following a backslash. There is no
-       such problem when named parentheses are used. A back reference  to  any
+       details of the handling of digits following a backslash.  There  is  no
+       such  problem  when named parentheses are used. A back reference to any
        subpattern is possible using named parentheses (see below).


-       Another  way  of  avoiding  the ambiguity inherent in the use of digits
+       Another way of avoiding the ambiguity inherent in  the  use  of  digits
        following a backslash is to use the \g escape sequence, which is a fea-
-       ture  introduced  in  Perl  5.10.  This  escape  must be followed by an
-       unsigned number or a negative number, optionally  enclosed  in  braces.
+       ture introduced in Perl 5.10.  This  escape  must  be  followed  by  an
+       unsigned  number  or  a negative number, optionally enclosed in braces.
        These examples are all identical:


          (ring), \1
          (ring), \g1
          (ring), \g{1}


-       An  unsigned number specifies an absolute reference without the ambigu-
+       An unsigned number specifies an absolute reference without the  ambigu-
        ity that is present in the older syntax. It is also useful when literal
        digits follow the reference. A negative number is a relative reference.
        Consider this example:
@@ -4440,33 +4449,33 @@
          (abc(def)ghi)\g{-1}


        The sequence \g{-1} is a reference to the most recently started captur-
-       ing  subpattern  before \g, that is, is it equivalent to \2. Similarly,
+       ing subpattern before \g, that is, is it equivalent to  \2.  Similarly,
        \g{-2} would be equivalent to \1. The use of relative references can be
-       helpful  in  long  patterns,  and  also in patterns that are created by
+       helpful in long patterns, and also in  patterns  that  are  created  by
        joining together fragments that contain references within themselves.


-       A back reference matches whatever actually matched the  capturing  sub-
-       pattern  in  the  current subject string, rather than anything matching
+       A  back  reference matches whatever actually matched the capturing sub-
+       pattern in the current subject string, rather  than  anything  matching
        the subpattern itself (see "Subpatterns as subroutines" below for a way
        of doing that). So the pattern


          (sens|respons)e and \1ibility


-       matches  "sense and sensibility" and "response and responsibility", but
-       not "sense and responsibility". If caseful matching is in force at  the
-       time  of the back reference, the case of letters is relevant. For exam-
+       matches "sense and sensibility" and "response and responsibility",  but
+       not  "sense and responsibility". If caseful matching is in force at the
+       time of the back reference, the case of letters is relevant. For  exam-
        ple,


          ((?i)rah)\s+\1


-       matches "rah rah" and "RAH RAH", but not "RAH  rah",  even  though  the
+       matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the
        original capturing subpattern is matched caselessly.


-       There  are  several  different ways of writing back references to named
-       subpatterns. The .NET syntax \k{name} and the Perl syntax  \k<name>  or
-       \k'name'  are supported, as is the Python syntax (?P=name). Perl 5.10's
+       There are several different ways of writing back  references  to  named
+       subpatterns.  The  .NET syntax \k{name} and the Perl syntax \k<name> or
+       \k'name' are supported, as is the Python syntax (?P=name). Perl  5.10's
        unified back reference syntax, in which \g can be used for both numeric
-       and  named  references,  is  also supported. We could rewrite the above
+       and named references, is also supported. We  could  rewrite  the  above
        example in any of the following ways:


          (?<p1>(?i)rah)\s+\k<p1>
@@ -4474,67 +4483,67 @@
          (?P<p1>(?i)rah)\s+(?P=p1)
          (?<p1>(?i)rah)\s+\g{p1}


-       A subpattern that is referenced by  name  may  appear  in  the  pattern
+       A  subpattern  that  is  referenced  by  name may appear in the pattern
        before or after the reference.


-       There  may be more than one back reference to the same subpattern. If a
-       subpattern has not actually been used in a particular match,  any  back
+       There may be more than one back reference to the same subpattern. If  a
+       subpattern  has  not actually been used in a particular match, any back
        references to it always fail by default. For example, the pattern


          (a|(bc))\2


-       always  fails  if  it starts to match "a" rather than "bc". However, if
+       always fails if it starts to match "a" rather than  "bc".  However,  if
        the PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back refer-
        ence to an unset value matches an empty string.


-       Because  there may be many capturing parentheses in a pattern, all dig-
-       its following a backslash are taken as part of a potential back  refer-
-       ence  number.   If  the  pattern continues with a digit character, some
-       delimiter must  be  used  to  terminate  the  back  reference.  If  the
+       Because there may be many capturing parentheses in a pattern, all  dig-
+       its  following a backslash are taken as part of a potential back refer-
+       ence number.  If the pattern continues with  a  digit  character,  some
+       delimiter  must  be  used  to  terminate  the  back  reference.  If the
        PCRE_EXTENDED option is set, this can be whitespace. Otherwise, the \g{
        syntax or an empty comment (see "Comments" below) can be used.


    Recursive back references


-       A back reference that occurs inside the parentheses to which it  refers
-       fails  when  the subpattern is first used, so, for example, (a\1) never
-       matches.  However, such references can be useful inside  repeated  sub-
+       A  back reference that occurs inside the parentheses to which it refers
+       fails when the subpattern is first used, so, for example,  (a\1)  never
+       matches.   However,  such references can be useful inside repeated sub-
        patterns. For example, the pattern


          (a|b\1)+


        matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
-       ation of the subpattern,  the  back  reference  matches  the  character
-       string  corresponding  to  the previous iteration. In order for this to
-       work, the pattern must be such that the first iteration does  not  need
-       to  match the back reference. This can be done using alternation, as in
+       ation  of  the  subpattern,  the  back  reference matches the character
+       string corresponding to the previous iteration. In order  for  this  to
+       work,  the  pattern must be such that the first iteration does not need
+       to match the back reference. This can be done using alternation, as  in
        the example above, or by a quantifier with a minimum of zero.


-       Back references of this type cause the group that they reference to  be
-       treated  as  an atomic group.  Once the whole group has been matched, a
-       subsequent matching failure cannot cause backtracking into  the  middle
+       Back  references of this type cause the group that they reference to be
+       treated as an atomic group.  Once the whole group has been  matched,  a
+       subsequent  matching  failure cannot cause backtracking into the middle
        of the group.



ASSERTIONS

-       An  assertion  is  a  test on the characters following or preceding the
-       current matching point that does not actually consume  any  characters.
-       The  simple  assertions  coded  as  \b, \B, \A, \G, \Z, \z, ^ and $ are
+       An assertion is a test on the characters  following  or  preceding  the
+       current  matching  point that does not actually consume any characters.
+       The simple assertions coded as \b, \B, \A, \G, \Z,  \z,  ^  and  $  are
        described above.


-       More complicated assertions are coded as  subpatterns.  There  are  two
-       kinds:  those  that  look  ahead of the current position in the subject
-       string, and those that look  behind  it.  An  assertion  subpattern  is
-       matched  in  the  normal way, except that it does not cause the current
+       More  complicated  assertions  are  coded as subpatterns. There are two
+       kinds: those that look ahead of the current  position  in  the  subject
+       string,  and  those  that  look  behind  it. An assertion subpattern is
+       matched in the normal way, except that it does not  cause  the  current
        matching position to be changed.


-       Assertion subpatterns are not capturing subpatterns,  and  may  not  be
-       repeated,  because  it  makes no sense to assert the same thing several
-       times. If any kind of assertion contains capturing  subpatterns  within
-       it,  these are counted for the purposes of numbering the capturing sub-
+       Assertion  subpatterns  are  not  capturing subpatterns, and may not be
+       repeated, because it makes no sense to assert the  same  thing  several
+       times.  If  any kind of assertion contains capturing subpatterns within
+       it, these are counted for the purposes of numbering the capturing  sub-
        patterns in the whole pattern.  However, substring capturing is carried
-       out  only  for  positive assertions, because it does not make sense for
+       out only for positive assertions, because it does not  make  sense  for
        negative assertions.


    Lookahead assertions
@@ -4544,38 +4553,38 @@


          \w+(?=;)


-       matches  a word followed by a semicolon, but does not include the semi-
+       matches a word followed by a semicolon, but does not include the  semi-
        colon in the match, and


          foo(?!bar)


-       matches any occurrence of "foo" that is not  followed  by  "bar".  Note
+       matches  any  occurrence  of  "foo" that is not followed by "bar". Note
        that the apparently similar pattern


          (?!foo)bar


-       does  not  find  an  occurrence  of "bar" that is preceded by something
-       other than "foo"; it finds any occurrence of "bar" whatsoever,  because
+       does not find an occurrence of "bar"  that  is  preceded  by  something
+       other  than "foo"; it finds any occurrence of "bar" whatsoever, because
        the assertion (?!foo) is always true when the next three characters are
        "bar". A lookbehind assertion is needed to achieve the other effect.


        If you want to force a matching failure at some point in a pattern, the
-       most  convenient  way  to  do  it  is with (?!) because an empty string
-       always matches, so an assertion that requires there not to be an  empty
-       string  must  always  fail.   The  Perl  5.10 backtracking control verb
+       most convenient way to do it is  with  (?!)  because  an  empty  string
+       always  matches, so an assertion that requires there not to be an empty
+       string must always fail.   The  Perl  5.10  backtracking  control  verb
        (*FAIL) or (*F) is essentially a synonym for (?!).


    Lookbehind assertions


-       Lookbehind assertions start with (?<= for positive assertions and  (?<!
+       Lookbehind  assertions start with (?<= for positive assertions and (?<!
        for negative assertions. For example,


          (?<!foo)bar


-       does  find  an  occurrence  of "bar" that is not preceded by "foo". The
-       contents of a lookbehind assertion are restricted  such  that  all  the
+       does find an occurrence of "bar" that is not  preceded  by  "foo".  The
+       contents  of  a  lookbehind  assertion are restricted such that all the
        strings it matches must have a fixed length. However, if there are sev-
-       eral top-level alternatives, they do not all  have  to  have  the  same
+       eral  top-level  alternatives,  they  do  not all have to have the same
        fixed length. Thus


          (?<=bullock|donkey)
@@ -4584,62 +4593,62 @@


          (?<!dogs?|cats?)


-       causes  an  error at compile time. Branches that match different length
-       strings are permitted only at the top level of a lookbehind  assertion.
-       This  is an extension compared with Perl (5.8 and 5.10), which requires
+       causes an error at compile time. Branches that match  different  length
+       strings  are permitted only at the top level of a lookbehind assertion.
+       This is an extension compared with Perl (5.8 and 5.10), which  requires
        all branches to match the same length of string. An assertion such as


          (?<=ab(c|de))


-       is not permitted, because its single top-level  branch  can  match  two
+       is  not  permitted,  because  its single top-level branch can match two
        different lengths, but it is acceptable to PCRE if rewritten to use two
        top-level branches:


          (?<=abc|abde)


        In some cases, the Perl 5.10 escape sequence \K (see above) can be used
-       instead  of  a  lookbehind  assertion  to  get  round  the fixed-length
+       instead of  a  lookbehind  assertion  to  get  round  the  fixed-length
        restriction.


-       The implementation of lookbehind assertions is, for  each  alternative,
-       to  temporarily  move the current position back by the fixed length and
+       The  implementation  of lookbehind assertions is, for each alternative,
+       to temporarily move the current position back by the fixed  length  and
        then try to match. If there are insufficient characters before the cur-
        rent position, the assertion fails.


        PCRE does not allow the \C escape (which matches a single byte in UTF-8
-       mode) to appear in lookbehind assertions, because it makes it  impossi-
-       ble  to  calculate the length of the lookbehind. The \X and \R escapes,
+       mode)  to appear in lookbehind assertions, because it makes it impossi-
+       ble to calculate the length of the lookbehind. The \X and  \R  escapes,
        which can match different numbers of bytes, are also not permitted.


-       "Subroutine" calls (see below) such as (?2) or (?&X) are  permitted  in
-       lookbehinds,  as  long as the subpattern matches a fixed-length string.
+       "Subroutine"  calls  (see below) such as (?2) or (?&X) are permitted in
+       lookbehinds, as long as the subpattern matches a  fixed-length  string.
        Recursion, however, is not supported.


-       Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
+       Possessive  quantifiers  can  be  used  in  conjunction with lookbehind
        assertions to specify efficient matching of fixed-length strings at the
        end of subject strings. Consider a simple pattern such as


          abcd$


-       when applied to a long string that does  not  match.  Because  matching
+       when  applied  to  a  long string that does not match. Because matching
        proceeds from left to right, PCRE will look for each "a" in the subject
-       and then see if what follows matches the rest of the  pattern.  If  the
+       and  then  see  if what follows matches the rest of the pattern. If the
        pattern is specified as


          ^.*abcd$


-       the  initial .* matches the entire string at first, but when this fails
+       the initial .* matches the entire string at first, but when this  fails
        (because there is no following "a"), it backtracks to match all but the
-       last  character,  then all but the last two characters, and so on. Once
-       again the search for "a" covers the entire string, from right to  left,
+       last character, then all but the last two characters, and so  on.  Once
+       again  the search for "a" covers the entire string, from right to left,
        so we are no better off. However, if the pattern is written as


          ^.*+(?<=abcd)


-       there  can  be  no backtracking for the .*+ item; it can match only the
-       entire string. The subsequent lookbehind assertion does a  single  test
-       on  the last four characters. If it fails, the match fails immediately.
-       For long strings, this approach makes a significant difference  to  the
+       there can be no backtracking for the .*+ item; it can  match  only  the
+       entire  string.  The subsequent lookbehind assertion does a single test
+       on the last four characters. If it fails, the match fails  immediately.
+       For  long  strings, this approach makes a significant difference to the
        processing time.


    Using multiple assertions
@@ -4648,18 +4657,18 @@


          (?<=\d{3})(?<!999)foo


-       matches  "foo" preceded by three digits that are not "999". Notice that
-       each of the assertions is applied independently at the  same  point  in
-       the  subject  string.  First  there  is a check that the previous three
-       characters are all digits, and then there is  a  check  that  the  same
+       matches "foo" preceded by three digits that are not "999". Notice  that
+       each  of  the  assertions is applied independently at the same point in
+       the subject string. First there is a  check  that  the  previous  three
+       characters  are  all  digits,  and  then there is a check that the same
        three characters are not "999".  This pattern does not match "foo" pre-
-       ceded by six characters, the first of which are  digits  and  the  last
-       three  of  which  are not "999". For example, it doesn't match "123abc-
+       ceded  by  six  characters,  the first of which are digits and the last
+       three of which are not "999". For example, it  doesn't  match  "123abc-
        foo". A pattern to do that is


          (?<=\d{3}...)(?<!999)foo


-       This time the first assertion looks at the  preceding  six  characters,
+       This  time  the  first assertion looks at the preceding six characters,
        checking that the first three are digits, and then the second assertion
        checks that the preceding three characters are not "999".


@@ -4667,96 +4676,96 @@

          (?<=(?<!foo)bar)baz


-       matches an occurrence of "baz" that is preceded by "bar" which in  turn
+       matches  an occurrence of "baz" that is preceded by "bar" which in turn
        is not preceded by "foo", while


          (?<=\d{3}(?!999)...)foo


-       is  another pattern that matches "foo" preceded by three digits and any
+       is another pattern that matches "foo" preceded by three digits and  any
        three characters that are not "999".



CONDITIONAL SUBPATTERNS

-       It is possible to cause the matching process to obey a subpattern  con-
-       ditionally  or to choose between two alternative subpatterns, depending
-       on the result of an assertion, or whether a specific capturing  subpat-
-       tern  has  already  been matched. The two possible forms of conditional
+       It  is possible to cause the matching process to obey a subpattern con-
+       ditionally or to choose between two alternative subpatterns,  depending
+       on  the result of an assertion, or whether a specific capturing subpat-
+       tern has already been matched. The two possible  forms  of  conditional
        subpattern are:


          (?(condition)yes-pattern)
          (?(condition)yes-pattern|no-pattern)


-       If the condition is satisfied, the yes-pattern is used;  otherwise  the
-       no-pattern  (if  present)  is used. If there are more than two alterna-
+       If  the  condition is satisfied, the yes-pattern is used; otherwise the
+       no-pattern (if present) is used. If there are more  than  two  alterna-
        tives in the subpattern, a compile-time error occurs.


-       There are four kinds of condition: references  to  subpatterns,  refer-
+       There  are  four  kinds of condition: references to subpatterns, refer-
        ences to recursion, a pseudo-condition called DEFINE, and assertions.


    Checking for a used subpattern by number


-       If  the  text between the parentheses consists of a sequence of digits,
+       If the text between the parentheses consists of a sequence  of  digits,
        the condition is true if a capturing subpattern of that number has pre-
-       viously  matched.  If  there is more than one capturing subpattern with
-       the same number (see the earlier  section  about  duplicate  subpattern
+       viously matched. If there is more than one  capturing  subpattern  with
+       the  same  number  (see  the earlier section about duplicate subpattern
        numbers), the condition is true if any of them have been set. An alter-
-       native notation is to precede the digits with a plus or minus sign.  In
-       this  case, the subpattern number is relative rather than absolute. The
-       most recently opened parentheses can be referenced by (?(-1), the  next
-       most  recent  by  (?(-2),  and so on. In looping constructs it can also
-       make sense to refer  to  subsequent  groups  with  constructs  such  as
+       native  notation is to precede the digits with a plus or minus sign. In
+       this case, the subpattern number is relative rather than absolute.  The
+       most  recently opened parentheses can be referenced by (?(-1), the next
+       most recent by (?(-2), and so on. In looping  constructs  it  can  also
+       make  sense  to  refer  to  subsequent  groups  with constructs such as
        (?(+2).


-       Consider  the  following  pattern, which contains non-significant white
+       Consider the following pattern, which  contains  non-significant  white
        space to make it more readable (assume the PCRE_EXTENDED option) and to
        divide it into three parts for ease of discussion:


          ( \( )?    [^()]+    (?(1) \) )


-       The  first  part  matches  an optional opening parenthesis, and if that
+       The first part matches an optional opening  parenthesis,  and  if  that
        character is present, sets it as the first captured substring. The sec-
-       ond  part  matches one or more characters that are not parentheses. The
+       ond part matches one or more characters that are not  parentheses.  The
        third part is a conditional subpattern that tests whether the first set
        of parentheses matched or not. If they did, that is, if subject started
        with an opening parenthesis, the condition is true, and so the yes-pat-
-       tern  is  executed  and  a  closing parenthesis is required. Otherwise,
-       since no-pattern is not present, the  subpattern  matches  nothing.  In
-       other  words,  this  pattern  matches  a  sequence  of non-parentheses,
+       tern is executed and a  closing  parenthesis  is  required.  Otherwise,
+       since  no-pattern  is  not  present, the subpattern matches nothing. In
+       other words,  this  pattern  matches  a  sequence  of  non-parentheses,
        optionally enclosed in parentheses.


-       If you were embedding this pattern in a larger one,  you  could  use  a
+       If  you  were  embedding  this pattern in a larger one, you could use a
        relative reference:


          ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...


-       This  makes  the  fragment independent of the parentheses in the larger
+       This makes the fragment independent of the parentheses  in  the  larger
        pattern.


    Checking for a used subpattern by name


-       Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a
-       used  subpattern  by  name.  For compatibility with earlier versions of
-       PCRE, which had this facility before Perl, the syntax  (?(name)...)  is
-       also  recognized. However, there is a possible ambiguity with this syn-
-       tax, because subpattern names may  consist  entirely  of  digits.  PCRE
-       looks  first for a named subpattern; if it cannot find one and the name
-       consists entirely of digits, PCRE looks for a subpattern of  that  num-
-       ber,  which must be greater than zero. Using subpattern names that con-
+       Perl  uses  the  syntax  (?(<name>)...) or (?('name')...) to test for a
+       used subpattern by name. For compatibility  with  earlier  versions  of
+       PCRE,  which  had this facility before Perl, the syntax (?(name)...) is
+       also recognized. However, there is a possible ambiguity with this  syn-
+       tax,  because  subpattern  names  may  consist entirely of digits. PCRE
+       looks first for a named subpattern; if it cannot find one and the  name
+       consists  entirely  of digits, PCRE looks for a subpattern of that num-
+       ber, which must be greater than zero. Using subpattern names that  con-
        sist entirely of digits is not recommended.


        Rewriting the above example to use a named subpattern gives this:


          (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )


-       If the name used in a condition of this kind is a duplicate,  the  test
-       is  applied to all subpatterns of the same name, and is true if any one
+       If  the  name used in a condition of this kind is a duplicate, the test
+       is applied to all subpatterns of the same name, and is true if any  one
        of them has matched.


    Checking for pattern recursion


        If the condition is the string (R), and there is no subpattern with the
-       name  R, the condition is true if a recursive call to the whole pattern
+       name R, the condition is true if a recursive call to the whole  pattern
        or any subpattern has been made. If digits or a name preceded by amper-
        sand follow the letter R, for example:


@@ -4764,77 +4773,77 @@

        the condition is true if the most recent recursion is into a subpattern
        whose number or name is given. This condition does not check the entire
-       recursion  stack.  If  the  name  used in a condition of this kind is a
+       recursion stack. If the name used in a condition  of  this  kind  is  a
        duplicate, the test is applied to all subpatterns of the same name, and
        is true if any one of them is the most recent recursion.


-       At  "top  level",  all  these recursion test conditions are false.  The
+       At "top level", all these recursion test  conditions  are  false.   The
        syntax for recursive patterns is described below.


    Defining subpatterns for use by reference only


-       If the condition is the string (DEFINE), and  there  is  no  subpattern
-       with  the  name  DEFINE,  the  condition is always false. In this case,
-       there may be only one alternative  in  the  subpattern.  It  is  always
-       skipped  if  control  reaches  this  point  in the pattern; the idea of
-       DEFINE is that it can be used to define "subroutines" that can be  ref-
-       erenced  from elsewhere. (The use of "subroutines" is described below.)
-       For example, a pattern to match an IPv4 address could be  written  like
+       If  the  condition  is  the string (DEFINE), and there is no subpattern
+       with the name DEFINE, the condition is  always  false.  In  this  case,
+       there  may  be  only  one  alternative  in the subpattern. It is always
+       skipped if control reaches this point  in  the  pattern;  the  idea  of
+       DEFINE  is that it can be used to define "subroutines" that can be ref-
+       erenced from elsewhere. (The use of "subroutines" is described  below.)
+       For  example,  a pattern to match an IPv4 address could be written like
        this (ignore whitespace and line breaks):


          (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
          \b (?&byte) (\.(?&byte)){3} \b


-       The  first part of the pattern is a DEFINE group inside which a another
-       group named "byte" is defined. This matches an individual component  of
-       an  IPv4  address  (a number less than 256). When matching takes place,
-       this part of the pattern is skipped because DEFINE acts  like  a  false
-       condition.  The  rest of the pattern uses references to the named group
-       to match the four dot-separated components of an IPv4 address,  insist-
+       The first part of the pattern is a DEFINE group inside which a  another
+       group  named "byte" is defined. This matches an individual component of
+       an IPv4 address (a number less than 256). When  matching  takes  place,
+       this  part  of  the pattern is skipped because DEFINE acts like a false
+       condition. The rest of the pattern uses references to the  named  group
+       to  match the four dot-separated components of an IPv4 address, insist-
        ing on a word boundary at each end.


    Assertion conditions


-       If  the  condition  is  not  in any of the above formats, it must be an
-       assertion.  This may be a positive or negative lookahead or  lookbehind
-       assertion.  Consider  this  pattern,  again  containing non-significant
+       If the condition is not in any of the above  formats,  it  must  be  an
+       assertion.   This may be a positive or negative lookahead or lookbehind
+       assertion. Consider  this  pattern,  again  containing  non-significant
        white space, and with the two alternatives on the second line:


          (?(?=[^a-z]*[a-z])
          \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )


-       The condition  is  a  positive  lookahead  assertion  that  matches  an
-       optional  sequence of non-letters followed by a letter. In other words,
-       it tests for the presence of at least one letter in the subject.  If  a
-       letter  is found, the subject is matched against the first alternative;
-       otherwise it is  matched  against  the  second.  This  pattern  matches
-       strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
+       The  condition  is  a  positive  lookahead  assertion  that  matches an
+       optional sequence of non-letters followed by a letter. In other  words,
+       it  tests  for the presence of at least one letter in the subject. If a
+       letter is found, the subject is matched against the first  alternative;
+       otherwise  it  is  matched  against  the  second.  This pattern matches
+       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
        letters and dd are digits.



COMMENTS

-       The sequence (?# marks the start of a comment that continues up to  the
-       next  closing  parenthesis.  Nested  parentheses are not permitted. The
-       characters that make up a comment play no part in the pattern  matching
+       The  sequence (?# marks the start of a comment that continues up to the
+       next closing parenthesis. Nested parentheses  are  not  permitted.  The
+       characters  that make up a comment play no part in the pattern matching
        at all.


-       If  the PCRE_EXTENDED option is set, an unescaped # character outside a
-       character class introduces a  comment  that  continues  to  immediately
+       If the PCRE_EXTENDED option is set, an unescaped # character outside  a
+       character  class  introduces  a  comment  that continues to immediately
        after the next newline in the pattern.



RECURSIVE PATTERNS

-       Consider  the problem of matching a string in parentheses, allowing for
-       unlimited nested parentheses. Without the use of  recursion,  the  best
-       that  can  be  done  is  to use a pattern that matches up to some fixed
-       depth of nesting. It is not possible to  handle  an  arbitrary  nesting
+       Consider the problem of matching a string in parentheses, allowing  for
+       unlimited  nested  parentheses.  Without the use of recursion, the best
+       that can be done is to use a pattern that  matches  up  to  some  fixed
+       depth  of  nesting.  It  is not possible to handle an arbitrary nesting
        depth.


        For some time, Perl has provided a facility that allows regular expres-
-       sions to recurse (amongst other things). It does this by  interpolating
-       Perl  code in the expression at run time, and the code can refer to the
+       sions  to recurse (amongst other things). It does this by interpolating
+       Perl code in the expression at run time, and the code can refer to  the
        expression itself. A Perl pattern using code interpolation to solve the
        parentheses problem can be created like this:


@@ -4844,182 +4853,182 @@
        refers recursively to the pattern in which it appears.


        Obviously, PCRE cannot support the interpolation of Perl code. Instead,
-       it  supports  special  syntax  for recursion of the entire pattern, and
-       also for individual subpattern recursion.  After  its  introduction  in
-       PCRE  and  Python,  this  kind of recursion was subsequently introduced
+       it supports special syntax for recursion of  the  entire  pattern,  and
+       also  for  individual  subpattern  recursion. After its introduction in
+       PCRE and Python, this kind of  recursion  was  subsequently  introduced
        into Perl at release 5.10.


-       A special item that consists of (? followed by a  number  greater  than
+       A  special  item  that consists of (? followed by a number greater than
        zero and a closing parenthesis is a recursive call of the subpattern of
-       the given number, provided that it occurs inside that  subpattern.  (If
-       not,  it  is  a  "subroutine" call, which is described in the next sec-
-       tion.) The special item (?R) or (?0) is a recursive call of the  entire
+       the  given  number, provided that it occurs inside that subpattern. (If
+       not, it is a "subroutine" call, which is described  in  the  next  sec-
+       tion.)  The special item (?R) or (?0) is a recursive call of the entire
        regular expression.


-       This  PCRE  pattern  solves  the nested parentheses problem (assume the
+       This PCRE pattern solves the nested  parentheses  problem  (assume  the
        PCRE_EXTENDED option is set so that white space is ignored):


          \( ( [^()]++ | (?R) )* \)


-       First it matches an opening parenthesis. Then it matches any number  of
-       substrings  which  can  either  be  a sequence of non-parentheses, or a
-       recursive match of the pattern itself (that is, a  correctly  parenthe-
+       First  it matches an opening parenthesis. Then it matches any number of
+       substrings which can either be a  sequence  of  non-parentheses,  or  a
+       recursive  match  of the pattern itself (that is, a correctly parenthe-
        sized substring).  Finally there is a closing parenthesis. Note the use
        of a possessive quantifier to avoid backtracking into sequences of non-
        parentheses.


-       If  this  were  part of a larger pattern, you would not want to recurse
+       If this were part of a larger pattern, you would not  want  to  recurse
        the entire pattern, so instead you could use this:


          ( \( ( [^()]++ | (?1) )* \) )


-       We have put the pattern into parentheses, and caused the  recursion  to
+       We  have  put the pattern into parentheses, and caused the recursion to
        refer to them instead of the whole pattern.


-       In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
-       tricky. This is made easier by the use of relative references  (a  Perl
-       5.10  feature).   Instead  of  (?1)  in the pattern above you can write
+       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
+       tricky.  This  is made easier by the use of relative references (a Perl
+       5.10 feature).  Instead of (?1) in the  pattern  above  you  can  write
        (?-2) to refer to the second most recently opened parentheses preceding
-       the  recursion.  In  other  words,  a  negative number counts capturing
+       the recursion. In other  words,  a  negative  number  counts  capturing
        parentheses leftwards from the point at which it is encountered.


-       It is also possible to refer to  subsequently  opened  parentheses,  by
-       writing  references  such  as (?+2). However, these cannot be recursive
-       because the reference is not inside the  parentheses  that  are  refer-
-       enced.  They  are  always  "subroutine" calls, as described in the next
+       It  is  also  possible  to refer to subsequently opened parentheses, by
+       writing references such as (?+2). However, these  cannot  be  recursive
+       because  the  reference  is  not inside the parentheses that are refer-
+       enced. They are always "subroutine" calls, as  described  in  the  next
        section.


-       An alternative approach is to use named parentheses instead.  The  Perl
-       syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also
+       An  alternative  approach is to use named parentheses instead. The Perl
+       syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
        supported. We could rewrite the above example as follows:


          (?<pn> \( ( [^()]++ | (?&pn) )* \) )


-       If there is more than one subpattern with the same name,  the  earliest
+       If  there  is more than one subpattern with the same name, the earliest
        one is used.


-       This  particular  example pattern that we have been looking at contains
+       This particular example pattern that we have been looking  at  contains
        nested unlimited repeats, and so the use of a possessive quantifier for
        matching strings of non-parentheses is important when applying the pat-
-       tern to strings that do not match. For example, when  this  pattern  is
+       tern  to  strings  that do not match. For example, when this pattern is
        applied to


          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()


-       it  yields  "no  match" quickly. However, if a possessive quantifier is
-       not used, the match runs for a very long time indeed because there  are
-       so  many  different  ways the + and * repeats can carve up the subject,
+       it yields "no match" quickly. However, if a  possessive  quantifier  is
+       not  used, the match runs for a very long time indeed because there are
+       so many different ways the + and * repeats can carve  up  the  subject,
        and all have to be tested before failure can be reported.


-       At the end of a match, the values of capturing  parentheses  are  those
-       from  the outermost level. If you want to obtain intermediate values, a
-       callout function can be used (see below and the pcrecallout  documenta-
+       At  the  end  of a match, the values of capturing parentheses are those
+       from the outermost level. If you want to obtain intermediate values,  a
+       callout  function can be used (see below and the pcrecallout documenta-
        tion). If the pattern above is matched against


          (ab(cd)ef)


-       the  value  for  the  inner capturing parentheses (numbered 2) is "ef",
-       which is the last value taken on at the top level. If a capturing  sub-
+       the value for the inner capturing parentheses  (numbered  2)  is  "ef",
+       which  is the last value taken on at the top level. If a capturing sub-
        pattern is not matched at the top level, its final value is unset, even
        if it is (temporarily) set at a deeper level.


-       If there are more than 15 capturing parentheses in a pattern, PCRE  has
-       to  obtain extra memory to store data during a recursion, which it does
+       If  there are more than 15 capturing parentheses in a pattern, PCRE has
+       to obtain extra memory to store data during a recursion, which it  does
        by using pcre_malloc, freeing it via pcre_free afterwards. If no memory
        can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.


-       Do  not  confuse  the (?R) item with the condition (R), which tests for
-       recursion.  Consider this pattern, which matches text in  angle  brack-
-       ets,  allowing for arbitrary nesting. Only digits are allowed in nested
-       brackets (that is, when recursing), whereas any characters are  permit-
+       Do not confuse the (?R) item with the condition (R),  which  tests  for
+       recursion.   Consider  this pattern, which matches text in angle brack-
+       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
+       brackets  (that is, when recursing), whereas any characters are permit-
        ted at the outer level.


          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >


-       In  this  pattern, (?(R) is the start of a conditional subpattern, with
-       two different alternatives for the recursive and  non-recursive  cases.
+       In this pattern, (?(R) is the start of a conditional  subpattern,  with
+       two  different  alternatives for the recursive and non-recursive cases.
        The (?R) item is the actual recursive call.


    Recursion difference from Perl


-       In  PCRE (like Python, but unlike Perl), a recursive subpattern call is
+       In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
        always treated as an atomic group. That is, once it has matched some of
        the subject string, it is never re-entered, even if it contains untried
-       alternatives and there is a subsequent matching failure.  This  can  be
-       illustrated  by the following pattern, which purports to match a palin-
-       dromic string that contains an odd number of characters  (for  example,
+       alternatives  and  there  is a subsequent matching failure. This can be
+       illustrated by the following pattern, which purports to match a  palin-
+       dromic  string  that contains an odd number of characters (for example,
        "a", "aba", "abcba", "abcdcba"):


          ^(.|(.)(?1)\2)$


        The idea is that it either matches a single character, or two identical
-       characters surrounding a sub-palindrome. In Perl, this  pattern  works;
-       in  PCRE  it  does  not if the pattern is longer than three characters.
+       characters  surrounding  a sub-palindrome. In Perl, this pattern works;
+       in PCRE it does not if the pattern is  longer  than  three  characters.
        Consider the subject string "abcba":


-       At the top level, the first character is matched, but as it is  not  at
+       At  the  top level, the first character is matched, but as it is not at
        the end of the string, the first alternative fails; the second alterna-
        tive is taken and the recursion kicks in. The recursive call to subpat-
-       tern  1  successfully  matches the next character ("b"). (Note that the
+       tern 1 successfully matches the next character ("b").  (Note  that  the
        beginning and end of line tests are not part of the recursion).


-       Back at the top level, the next character ("c") is compared  with  what
-       subpattern  2 matched, which was "a". This fails. Because the recursion
-       is treated as an atomic group, there are now  no  backtracking  points,
-       and  so  the  entire  match fails. (Perl is able, at this point, to re-
-       enter the recursion and try the second alternative.)  However,  if  the
+       Back  at  the top level, the next character ("c") is compared with what
+       subpattern 2 matched, which was "a". This fails. Because the  recursion
+       is  treated  as  an atomic group, there are now no backtracking points,
+       and so the entire match fails. (Perl is able, at  this  point,  to  re-
+       enter  the  recursion  and try the second alternative.) However, if the
        pattern is written with the alternatives in the other order, things are
        different:


          ^((.)(?1)\2|.)$


-       This time, the recursing alternative is tried first, and  continues  to
-       recurse  until  it runs out of characters, at which point the recursion
-       fails. But this time we do have  another  alternative  to  try  at  the
-       higher  level.  That  is  the  big difference: in the previous case the
+       This  time,  the recursing alternative is tried first, and continues to
+       recurse until it runs out of characters, at which point  the  recursion
+       fails.  But  this  time  we  do  have another alternative to try at the
+       higher level. That is the big difference:  in  the  previous  case  the
        remaining alternative is at a deeper recursion level, which PCRE cannot
        use.


        To change the pattern so that matches all palindromic strings, not just
-       those with an odd number of characters, it is tempting  to  change  the
+       those  with  an  odd number of characters, it is tempting to change the
        pattern to this:


          ^((.)(?1)\2|.?)$


-       Again,  this  works  in Perl, but not in PCRE, and for the same reason.
-       When a deeper recursion has matched a single character,  it  cannot  be
-       entered  again  in  order  to match an empty string. The solution is to
-       separate the two cases, and write out the odd and even cases as  alter-
+       Again, this works in Perl, but not in PCRE, and for  the  same  reason.
+       When  a  deeper  recursion has matched a single character, it cannot be
+       entered again in order to match an empty string.  The  solution  is  to
+       separate  the two cases, and write out the odd and even cases as alter-
        natives at the higher level:


          ^(?:((.)(?1)\2|)|((.)(?3)\4|.))


-       If  you  want  to match typical palindromic phrases, the pattern has to
+       If you want to match typical palindromic phrases, the  pattern  has  to
        ignore all non-word characters, which can be done like this:


          ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$


        If run with the PCRE_CASELESS option, this pattern matches phrases such
        as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
-       Perl. Note the use of the possessive quantifier *+ to avoid  backtrack-
-       ing  into  sequences of non-word characters. Without this, PCRE takes a
-       great deal longer (ten times or more) to  match  typical  phrases,  and
+       Perl.  Note the use of the possessive quantifier *+ to avoid backtrack-
+       ing into sequences of non-word characters. Without this, PCRE  takes  a
+       great  deal  longer  (ten  times or more) to match typical phrases, and
        Perl takes so long that you think it has gone into a loop.


-       WARNING:  The  palindrome-matching patterns above work only if the sub-
-       ject string does not start with a palindrome that is shorter  than  the
-       entire  string.  For example, although "abcba" is correctly matched, if
-       the subject is "ababa", PCRE finds the palindrome "aba" at  the  start,
-       then  fails at top level because the end of the string does not follow.
-       Once again, it cannot jump back into the recursion to try other  alter-
+       WARNING: The palindrome-matching patterns above work only if  the  sub-
+       ject  string  does not start with a palindrome that is shorter than the
+       entire string.  For example, although "abcba" is correctly matched,  if
+       the  subject  is "ababa", PCRE finds the palindrome "aba" at the start,
+       then fails at top level because the end of the string does not  follow.
+       Once  again, it cannot jump back into the recursion to try other alter-
        natives, so the entire match fails.



SUBPATTERNS AS SUBROUTINES

        If the syntax for a recursive subpattern reference (either by number or
-       by name) is used outside the parentheses to which it refers,  it  oper-
-       ates  like a subroutine in a programming language. The "called" subpat-
+       by  name)  is used outside the parentheses to which it refers, it oper-
+       ates like a subroutine in a programming language. The "called"  subpat-
        tern may be defined before or after the reference. A numbered reference
        can be absolute or relative, as in these examples:


@@ -5031,113 +5040,113 @@

          (sens|respons)e and \1ibility


-       matches  "sense and sensibility" and "response and responsibility", but
+       matches "sense and sensibility" and "response and responsibility",  but
        not "sense and responsibility". If instead the pattern


          (sens|respons)e and (?1)ibility


-       is used, it does match "sense and responsibility" as well as the  other
-       two  strings.  Another  example  is  given  in the discussion of DEFINE
+       is  used, it does match "sense and responsibility" as well as the other
+       two strings. Another example is  given  in  the  discussion  of  DEFINE
        above.


-       Like recursive subpatterns, a subroutine call is always treated  as  an
-       atomic  group. That is, once it has matched some of the subject string,
-       it is never re-entered, even if it contains  untried  alternatives  and
-       there  is a subsequent matching failure. Any capturing parentheses that
-       are set during the subroutine call  revert  to  their  previous  values
+       Like  recursive  subpatterns, a subroutine call is always treated as an
+       atomic group. That is, once it has matched some of the subject  string,
+       it  is  never  re-entered, even if it contains untried alternatives and
+       there is a subsequent matching failure. Any capturing parentheses  that
+       are  set  during  the  subroutine  call revert to their previous values
        afterwards.


-       When  a  subpattern is used as a subroutine, processing options such as
+       When a subpattern is used as a subroutine, processing options  such  as
        case-independence are fixed when the subpattern is defined. They cannot
        be changed for different calls. For example, consider this pattern:


          (abc)(?i:(?-1))


-       It  matches  "abcabc". It does not match "abcABC" because the change of
+       It matches "abcabc". It does not match "abcABC" because the  change  of
        processing option does not affect the called subpattern.



ONIGURUMA SUBROUTINE SYNTAX

-       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
+       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
        name or a number enclosed either in angle brackets or single quotes, is
-       an alternative syntax for referencing a  subpattern  as  a  subroutine,
-       possibly  recursively. Here are two of the examples used above, rewrit-
+       an  alternative  syntax  for  referencing a subpattern as a subroutine,
+       possibly recursively. Here are two of the examples used above,  rewrit-
        ten using this syntax:


          (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
          (sens|respons)e and \g'1'ibility


-       PCRE supports an extension to Oniguruma: if a number is preceded  by  a
+       PCRE  supports  an extension to Oniguruma: if a number is preceded by a
        plus or a minus sign it is taken as a relative reference. For example:


          (abc)(?i:\g<-1>)


-       Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
-       synonymous. The former is a back reference; the latter is a  subroutine
+       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
+       synonymous.  The former is a back reference; the latter is a subroutine
        call.



CALLOUTS

        Perl has a feature whereby using the sequence (?{...}) causes arbitrary
-       Perl code to be obeyed in the middle of matching a regular  expression.
+       Perl  code to be obeyed in the middle of matching a regular expression.
        This makes it possible, amongst other things, to extract different sub-
        strings that match the same pair of parentheses when there is a repeti-
        tion.


        PCRE provides a similar feature, but of course it cannot obey arbitrary
        Perl code. The feature is called "callout". The caller of PCRE provides
-       an  external function by putting its entry point in the global variable
-       pcre_callout.  By default, this variable contains NULL, which  disables
+       an external function by putting its entry point in the global  variable
+       pcre_callout.   By default, this variable contains NULL, which disables
        all calling out.


-       Within  a  regular  expression,  (?C) indicates the points at which the
-       external function is to be called. If you want  to  identify  different
-       callout  points, you can put a number less than 256 after the letter C.
-       The default value is zero.  For example, this pattern has  two  callout
+       Within a regular expression, (?C) indicates the  points  at  which  the
+       external  function  is  to be called. If you want to identify different
+       callout points, you can put a number less than 256 after the letter  C.
+       The  default  value is zero.  For example, this pattern has two callout
        points:


          (?C1)abc(?C2)def


        If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
-       automatically installed before each item in the pattern. They  are  all
+       automatically  installed  before each item in the pattern. They are all
        numbered 255.


        During matching, when PCRE reaches a callout point (and pcre_callout is
-       set), the external function is called. It is provided with  the  number
-       of  the callout, the position in the pattern, and, optionally, one item
-       of data originally supplied by the caller of pcre_exec().  The  callout
-       function  may cause matching to proceed, to backtrack, or to fail alto-
+       set),  the  external function is called. It is provided with the number
+       of the callout, the position in the pattern, and, optionally, one  item
+       of  data  originally supplied by the caller of pcre_exec(). The callout
+       function may cause matching to proceed, to backtrack, or to fail  alto-
        gether. A complete description of the interface to the callout function
        is given in the pcrecallout documentation.



BACKTRACKING CONTROL

-       Perl  5.10 introduced a number of "Special Backtracking Control Verbs",
+       Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
        which are described in the Perl documentation as "experimental and sub-
-       ject  to  change or removal in a future version of Perl". It goes on to
-       say: "Their usage in production code should be noted to avoid  problems
+       ject to change or removal in a future version of Perl". It goes  on  to
+       say:  "Their usage in production code should be noted to avoid problems
        during upgrades." The same remarks apply to the PCRE features described
        in this section.


-       Since these verbs are specifically related  to  backtracking,  most  of
-       them  can  be  used  only  when  the  pattern  is  to  be matched using
+       Since  these  verbs  are  specifically related to backtracking, most of
+       them can be  used  only  when  the  pattern  is  to  be  matched  using
        pcre_exec(), which uses a backtracking algorithm. With the exception of
        (*FAIL), which behaves like a failing negative assertion, they cause an
        error if encountered by pcre_dfa_exec().


        If any of these verbs are used in an assertion or subroutine subpattern
-       (including  recursive  subpatterns),  their  effect is confined to that
-       subpattern; it does not extend to the surrounding  pattern.  Note  that
-       such  subpatterns are processed as anchored at the point where they are
+       (including recursive subpatterns), their effect  is  confined  to  that
+       subpattern;  it  does  not extend to the surrounding pattern. Note that
+       such subpatterns are processed as anchored at the point where they  are
        tested.


-       The new verbs make use of what was previously invalid syntax: an  open-
+       The  new verbs make use of what was previously invalid syntax: an open-
        ing parenthesis followed by an asterisk. In Perl, they are generally of
        the form (*VERB:ARG) but PCRE does not support the use of arguments, so
-       its  general  form is just (*VERB). Any number of these verbs may occur
+       its general form is just (*VERB). Any number of these verbs  may  occur
        in a pattern. There are two kinds:


    Verbs that act immediately
@@ -5146,94 +5155,94 @@


           (*ACCEPT)


-       This verb causes the match to end successfully, skipping the  remainder
-       of  the pattern. When inside a recursion, only the innermost pattern is
-       ended immediately. If (*ACCEPT) is inside  capturing  parentheses,  the
-       data  so  far  is  captured. (This feature was added to PCRE at release
+       This  verb causes the match to end successfully, skipping the remainder
+       of the pattern. When inside a recursion, only the innermost pattern  is
+       ended  immediately.  If  (*ACCEPT) is inside capturing parentheses, the
+       data so far is captured. (This feature was added  to  PCRE  at  release
        8.00.) For example:


          A((?:A|B(*ACCEPT)|C)D)


-       This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
+       This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
        tured by the outer parentheses.


          (*FAIL) or (*F)


-       This  verb  causes the match to fail, forcing backtracking to occur. It
-       is equivalent to (?!) but easier to read. The Perl documentation  notes
-       that  it  is  probably  useful only when combined with (?{}) or (??{}).
-       Those are, of course, Perl features that are not present in  PCRE.  The
-       nearest  equivalent is the callout feature, as for example in this pat-
+       This verb causes the match to fail, forcing backtracking to  occur.  It
+       is  equivalent to (?!) but easier to read. The Perl documentation notes
+       that it is probably useful only when combined  with  (?{})  or  (??{}).
+       Those  are,  of course, Perl features that are not present in PCRE. The
+       nearest equivalent is the callout feature, as for example in this  pat-
        tern:


          a+(?C)(*FAIL)


-       A match with the string "aaaa" always fails, but the callout  is  taken
+       A  match  with the string "aaaa" always fails, but the callout is taken
        before each backtrack happens (in this example, 10 times).


    Verbs that act after backtracking


        The following verbs do nothing when they are encountered. Matching con-
-       tinues with what follows, but if there is no subsequent match, a  fail-
-       ure  is  forced.   The  verbs  differ  in  exactly what kind of failure
+       tinues  with what follows, but if there is no subsequent match, a fail-
+       ure is forced.  The verbs  differ  in  exactly  what  kind  of  failure
        occurs.


          (*COMMIT)


-       This verb causes the whole match to fail outright if the  rest  of  the
-       pattern  does  not match. Even if the pattern is unanchored, no further
-       attempts to find a match by advancing the starting  point  take  place.
-       Once  (*COMMIT)  has been passed, pcre_exec() is committed to finding a
+       This  verb  causes  the whole match to fail outright if the rest of the
+       pattern does not match. Even if the pattern is unanchored,  no  further
+       attempts  to  find  a match by advancing the starting point take place.
+       Once (*COMMIT) has been passed, pcre_exec() is committed to  finding  a
        match at the current starting point, or not at all. For example:


          a+(*COMMIT)b


-       This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
+       This  matches  "xxaab" but not "aacaab". It can be thought of as a kind
        of dynamic anchor, or "I've started, so I must finish."


          (*PRUNE)


-       This  verb causes the match to fail at the current position if the rest
+       This verb causes the match to fail at the current position if the  rest
        of the pattern does not match. If the pattern is unanchored, the normal
-       "bumpalong"  advance to the next starting character then happens. Back-
-       tracking can occur as usual to the left of (*PRUNE), or  when  matching
-       to  the right of (*PRUNE), but if there is no match to the right, back-
-       tracking cannot cross (*PRUNE).  In simple cases, the use  of  (*PRUNE)
+       "bumpalong" advance to the next starting character then happens.  Back-
+       tracking  can  occur as usual to the left of (*PRUNE), or when matching
+       to the right of (*PRUNE), but if there is no match to the right,  back-
+       tracking  cannot  cross (*PRUNE).  In simple cases, the use of (*PRUNE)
        is just an alternative to an atomic group or possessive quantifier, but
-       there are some uses of (*PRUNE) that cannot be expressed in  any  other
+       there  are  some uses of (*PRUNE) that cannot be expressed in any other
        way.


          (*SKIP)


-       This  verb  is like (*PRUNE), except that if the pattern is unanchored,
-       the "bumpalong" advance is not to the next character, but to the  posi-
-       tion  in  the  subject where (*SKIP) was encountered. (*SKIP) signifies
-       that whatever text was matched leading up to it cannot  be  part  of  a
+       This verb is like (*PRUNE), except that if the pattern  is  unanchored,
+       the  "bumpalong" advance is not to the next character, but to the posi-
+       tion in the subject where (*SKIP) was  encountered.  (*SKIP)  signifies
+       that  whatever  text  was  matched leading up to it cannot be part of a
        successful match. Consider:


          a+(*SKIP)b


-       If  the  subject  is  "aaaac...",  after  the first match attempt fails
-       (starting at the first character in the  string),  the  starting  point
+       If the subject is "aaaac...",  after  the  first  match  attempt  fails
+       (starting  at  the  first  character in the string), the starting point
        skips on to start the next attempt at "c". Note that a possessive quan-
-       tifer does not have the same effect as this example; although it  would
-       suppress  backtracking  during  the  first  match  attempt,  the second
-       attempt would start at the second character instead of skipping  on  to
+       tifer  does not have the same effect as this example; although it would
+       suppress backtracking  during  the  first  match  attempt,  the  second
+       attempt  would  start at the second character instead of skipping on to
        "c".


          (*THEN)


        This verb causes a skip to the next alternation if the rest of the pat-
        tern does not match. That is, it cancels pending backtracking, but only
-       within  the  current  alternation.  Its name comes from the observation
+       within the current alternation. Its name  comes  from  the  observation
        that it can be used for a pattern-based if-then-else block:


          ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...


-       If the COND1 pattern matches, FOO is tried (and possibly further  items
-       after  the  end  of  the group if FOO succeeds); on failure the matcher
-       skips to the second alternative and tries COND2,  without  backtracking
-       into  COND1.  If  (*THEN)  is  used outside of any alternation, it acts
+       If  the COND1 pattern matches, FOO is tried (and possibly further items
+       after the end of the group if FOO succeeds);  on  failure  the  matcher
+       skips  to  the second alternative and tries COND2, without backtracking
+       into COND1. If (*THEN) is used outside  of  any  alternation,  it  acts
        exactly like (*PRUNE).



@@ -5251,7 +5260,7 @@

REVISION

-       Last updated: 11 January 2010
+       Last updated: 06 March 2010
        Copyright (c) 1997-2010 University of Cambridge.
 ------------------------------------------------------------------------------


@@ -5363,16 +5372,19 @@

SCRIPT NAMES FOR \p AND \P

-       Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
-       Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cu-
-       neiform,  Cypriot,  Cyrillic,  Deseret, Devanagari, Ethiopic, Georgian,
-       Glagolitic, Gothic, Greek, Gujarati, Gurmukhi,  Han,  Hangul,  Hanunoo,
-       Hebrew,  Hiragana,  Inherited, Kannada, Katakana, Kayah_Li, Kharoshthi,
-       Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian,  Malayalam,
-       Mongolian,  Myanmar,  New_Tai_Lue, Nko, Ogham, Old_Italic, Old_Persian,
-       Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Saurash-
-       tra,  Shavian,  Sinhala,  Sudanese, Syloti_Nagri, Syriac, Tagalog, Tag-
-       banwa,  Tai_Le,  Tamil,  Telugu,  Thaana,  Thai,   Tibetan,   Tifinagh,
+       Arabic, Armenian, Avestan, Balinese, Bamum, Bengali, Bopomofo, Braille,
+       Buginese, Buhid, Canadian_Aboriginal, Carian, Cham,  Cherokee,  Common,
+       Coptic,   Cuneiform,  Cypriot,  Cyrillic,  Deseret,  Devanagari,  Egyp-
+       tian_Hieroglyphs,  Ethiopic,  Georgian,  Glagolitic,   Gothic,   Greek,
+       Gujarati,  Gurmukhi,  Han,  Hangul,  Hanunoo,  Hebrew,  Hiragana, Impe-
+       rial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian,
+       Javanese,  Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao,
+       Latin,  Lepcha,  Limbu,  Linear_B,  Lisu,  Lycian,  Lydian,  Malayalam,
+       Meetei_Mayek,  Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Old_Italic,
+       Old_Persian, Old_South_Arabian, Old_Turkic, Ol_Chiki,  Oriya,  Osmanya,
+       Phags_Pa,  Phoenician,  Rejang,  Runic, Samaritan, Saurashtra, Shavian,
+       Sinhala, Sundanese, Syloti_Nagri, Syriac,  Tagalog,  Tagbanwa,  Tai_Le,
+       Tai_Tham,  Tai_Viet,  Tamil,  Telugu,  Thaana, Thai, Tibetan, Tifinagh,
        Ugaritic, Vai, Yi.



@@ -5552,7 +5564,7 @@
          (*ACCEPT)       force successful match
          (*FAIL)         force backtrack; synonym (*F)


-       The  following  act only when a subsequent match failure causes a back-
+       The following act only when a subsequent match failure causes  a  back-
        track to reach them. They all force a match failure, but they differ in
        what happens afterwards. Those that advance the start-of-match point do
        so only if the pattern is not anchored.
@@ -5565,7 +5577,7 @@


NEWLINE CONVENTIONS

-       These are recognized only at the very start of the pattern or  after  a
+       These  are  recognized only at the very start of the pattern or after a
        (*BSR_...) or (*UTF8) option.


          (*CR)           carriage return only
@@ -5577,7 +5589,7 @@


WHAT \R MATCHES

-       These  are  recognized only at the very start of the pattern or after a
+       These are recognized only at the very start of the pattern or  after  a
        (*...) option that sets the newline convention or UTF-8 mode.


          (*BSR_ANYCRLF)  CR, LF, or CRLF
@@ -5604,8 +5616,8 @@


REVISION

-       Last updated: 11 April 2009
-       Copyright (c) 1997-2009 University of Cambridge.
+       Last updated: 01 March 2010
+       Copyright (c) 1997-2010 University of Cambridge.
 ------------------------------------------------------------------------------



@@ -6129,14 +6141,14 @@
        can affect both of them.



-MEMORY USAGE
+COMPILED PATTERN MEMORY USAGE

        Patterns are compiled by PCRE into a reasonably efficient byte code, so
        that most simple patterns do not use much memory. However, there is one
-       case where memory usage can be unexpectedly large. When a parenthesized
-       subpattern has a quantifier with a minimum greater than 1 and/or a lim-
-       ited  maximum,  the  whole subpattern is repeated in the compiled code.
-       For example, the pattern
+       case  where  the memory usage of a compiled pattern can be unexpectedly
+       large. If a parenthesized subpattern has a quantifier  with  a  minimum
+       greater  than  1  and/or  a  limited  maximum,  the whole subpattern is
+       repeated in the compiled code. For example, the pattern


          (abc|def){2,4}


@@ -6178,73 +6190,83 @@
        otherwise handle.



+STACK USAGE AT RUN TIME
+
+       When  pcre_exec()  is  used  for matching, certain kinds of pattern can
+       cause it to use large amounts of the process stack.  In  some  environ-
+       ments  the default process stack is quite small, and if it runs out the
+       result is often SIGSEGV.  This issue is probably  the  most  frequently
+       raised  problem  with  PCRE. Rewriting your pattern can often help. The
+       pcrestack documentation discusses this issue in detail.
+
+
 PROCESSING TIME


-       Certain  items  in regular expression patterns are processed more effi-
+       Certain items in regular expression patterns are processed  more  effi-
        ciently than others. It is more efficient to use a character class like
-       [aeiou]   than   a   set   of  single-character  alternatives  such  as
-       (a|e|i|o|u). In general, the simplest construction  that  provides  the
+       [aeiou]  than  a  set  of   single-character   alternatives   such   as
+       (a|e|i|o|u).  In  general,  the simplest construction that provides the
        required behaviour is usually the most efficient. Jeffrey Friedl's book
-       contains a lot of useful general discussion  about  optimizing  regular
-       expressions  for  efficient  performance.  This document contains a few
+       contains  a  lot  of useful general discussion about optimizing regular
+       expressions for efficient performance. This  document  contains  a  few
        observations about PCRE.


-       Using Unicode character properties (the \p,  \P,  and  \X  escapes)  is
-       slow,  because PCRE has to scan a structure that contains data for over
-       fifteen thousand characters whenever it needs a  character's  property.
-       If  you  can  find  an  alternative pattern that does not use character
+       Using  Unicode  character  properties  (the  \p, \P, and \X escapes) is
+       slow, because PCRE has to scan a structure that contains data for  over
+       fifteen  thousand  characters whenever it needs a character's property.
+       If you can find an alternative pattern  that  does  not  use  character
        properties, it will probably be faster.


-       When a pattern begins with .* not in  parentheses,  or  in  parentheses
+       When  a  pattern  begins  with .* not in parentheses, or in parentheses
        that are not the subject of a backreference, and the PCRE_DOTALL option
-       is set, the pattern is implicitly anchored by PCRE, since it can  match
-       only  at  the start of a subject string. However, if PCRE_DOTALL is not
-       set, PCRE cannot make this optimization, because  the  .  metacharacter
-       does  not then match a newline, and if the subject string contains new-
-       lines, the pattern may match from the character  immediately  following
+       is  set, the pattern is implicitly anchored by PCRE, since it can match
+       only at the start of a subject string. However, if PCRE_DOTALL  is  not
+       set,  PCRE  cannot  make this optimization, because the . metacharacter
+       does not then match a newline, and if the subject string contains  new-
+       lines,  the  pattern may match from the character immediately following
        one of them instead of from the very start. For example, the pattern


          .*second


-       matches  the subject "first\nand second" (where \n stands for a newline
-       character), with the match starting at the seventh character. In  order
+       matches the subject "first\nand second" (where \n stands for a  newline
+       character),  with the match starting at the seventh character. In order
        to do this, PCRE has to retry the match starting after every newline in
        the subject.


-       If you are using such a pattern with subject strings that do  not  con-
+       If  you  are using such a pattern with subject strings that do not con-
        tain newlines, the best performance is obtained by setting PCRE_DOTALL,
-       or starting the pattern with ^.* or ^.*? to indicate  explicit  anchor-
-       ing.  That saves PCRE from having to scan along the subject looking for
+       or  starting  the pattern with ^.* or ^.*? to indicate explicit anchor-
+       ing. That saves PCRE from having to scan along the subject looking  for
        a newline to restart at.


-       Beware of patterns that contain nested indefinite  repeats.  These  can
-       take  a  long time to run when applied to a string that does not match.
+       Beware  of  patterns  that contain nested indefinite repeats. These can
+       take a long time to run when applied to a string that does  not  match.
        Consider the pattern fragment


          ^(a+)*


-       This can match "aaaa" in 16 different ways, and this  number  increases
-       very  rapidly  as the string gets longer. (The * repeat can match 0, 1,
-       2, 3, or 4 times, and for each of those cases other than 0 or 4, the  +
-       repeats  can  match  different numbers of times.) When the remainder of
+       This  can  match "aaaa" in 16 different ways, and this number increases
+       very rapidly as the string gets longer. (The * repeat can match  0,  1,
+       2,  3, or 4 times, and for each of those cases other than 0 or 4, the +
+       repeats can match different numbers of times.) When  the  remainder  of
        the pattern is such that the entire match is going to fail, PCRE has in
-       principle  to  try  every  possible  variation,  and  this  can take an
+       principle to try  every  possible  variation,  and  this  can  take  an
        extremely long time, even for relatively short strings.


        An optimization catches some of the more simple cases such as


          (a+)*b


-       where a literal character follows. Before  embarking  on  the  standard
-       matching  procedure,  PCRE checks that there is a "b" later in the sub-
-       ject string, and if there is not, it fails the match immediately.  How-
-       ever,  when  there  is no following literal this optimization cannot be
+       where  a  literal  character  follows. Before embarking on the standard
+       matching procedure, PCRE checks that there is a "b" later in  the  sub-
+       ject  string, and if there is not, it fails the match immediately. How-
+       ever, when there is no following literal this  optimization  cannot  be
        used. You can see the difference by comparing the behaviour of


          (a+)*\d


-       with the pattern above. The former gives  a  failure  almost  instantly
-       when  applied  to  a  whole  line of "a" characters, whereas the latter
+       with  the  pattern  above.  The former gives a failure almost instantly
+       when applied to a whole line of  "a"  characters,  whereas  the  latter
        takes an appreciable time with strings longer than about 20 characters.


        In many cases, the solution to this kind of performance issue is to use
@@ -6260,8 +6282,8 @@


REVISION

-       Last updated: 06 March 2007
-       Copyright (c) 1997-2007 University of Cambridge.
+       Last updated: 07 March 2010
+       Copyright (c) 1997-2010 University of Cambridge.
 ------------------------------------------------------------------------------




Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/doc/pcrepattern.3    2010-03-10 16:08:01 UTC (rev 507)
@@ -738,8 +738,8 @@
 .sp
 matches "foobar", the first substring is still set to "foo".
 .P
-Perl documents that the use of \eK within assertions is "not well defined". In 
-PCRE, \eK is acted upon when it occurs inside positive assertions, but is 
+Perl documents that the use of \eK within assertions is "not well defined". In
+PCRE, \eK is acted upon when it occurs inside positive assertions, but is
 ignored in negative assertions.
 .
 .


Modified: code/trunk/maint/README
===================================================================
--- code/trunk/maint/README    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/maint/README    2010-03-10 16:08:01 UTC (rev 507)
@@ -1,5 +1,5 @@
 MAINTENANCE README FOR PCRE
----------------------------
+===========================


The files in the "maint" directory of the PCRE source contain data, scripts,
and programs that are used for the maintenance of PCRE, but which do not form
@@ -14,14 +14,14 @@


Files in the maint directory
-----------------------------
+============================

------------------ This file is now OBSOLETE and no longer used ----------------
+---------------- This file is now OBSOLETE and no longer used ----------------
 Builducptable    A Perl script that creates the contents of the ucptable.h file
                  from two Unicode data files, which themselves are downloaded
                  from the Unicode web site. Run this script in the "maint"
                  directory.
------------------ This file is now OBSOLETE and no longer used ----------------
+---------------- This file is now OBSOLETE and no longer used ----------------


 GenerateUtt.py   A Python script to generate part of the pcre_tables.c file
                  that contains Unicode script names in a long string with
@@ -61,7 +61,7 @@



Updating to a new Unicode release
----------------------------------
+=================================

When there is a new release of Unicode, the files in Unicode.tables must be
refreshed from the web site. If the new version of Unicode adds new character
@@ -88,7 +88,7 @@


Preparing for a PCRE release
-----------------------------
+============================

This section contains a checklist of things that I consult before building a
distribution for a new release.
@@ -135,7 +135,9 @@
Many of these won't need changing, but over the long term things do change.

. Man pages: Check all man pages for \ not followed by e or f or " because
- that indicates a markup error.
+ that indicates a markup error. However, there is one exception: pcredemo.3,
+ which is created from the pcredemo.c program. It contains three instances
+ of \\n.

. When the release is built, test it on a number of different operating
systems if possible, and using different compilers as well. For example,
@@ -145,7 +147,7 @@


Making a PCRE release
----------------------
+=====================

 Run PrepareRelease and commit the files that it changes (by removing trailing
 spaces). Then run "make distcheck" to create the tarballs and the zipball.
@@ -155,11 +157,12 @@
            svn://vcs.exim.org/pcre/code/tags/pcre-8.xx 


Don't forget to update Freshmeat when the new release is out, and to tell
-webmaster@??? and the mailing list.
+webmaster@??? and the mailing list. Also, update the list of version
+numbers in Bugzilla (edit products).


Future ideas (wish list)
-------------------------
+========================

This section records a list of ideas so that they do not get forgotten. They
vary enormously in their usefulness and potential for implementation. Some are
@@ -280,7 +283,7 @@
. Callouts with arguments: (?Cn:ARG) for instance.

. A user is going to supply a patch to generalize the API for user-specific
- memory allocation so that it is more flexible in threaded environments. Thiw
+ memory allocation so that it is more flexible in threaded environments. This
was promised a long time ago, and never appeared...

. Write a function that generates random matching strings for a compiled regex.
@@ -309,8 +312,13 @@
. PCRE cannot at present distinguish between subpatterns with different names,
but the same number (created by the use of ?|). In order to do so, a way of
remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
+
+. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include
+ "something" and the the #ifdef appears only in one place, in "something".
+
+. Support for (*MARK) and arguments for (*PRUNE) and friends.

Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 26 September 2009
+Last updated: 10 March 2010

Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/pcre_compile.c    2010-03-10 16:08:01 UTC (rev 507)
@@ -92,7 +92,7 @@


#define COMPILE_WORK_SIZE (4096)

-/* The overrun tests check for a slightly smaller size so that they detect the
+/* The overrun tests check for a slightly smaller size so that they detect the
overrun before it actually does run off the end of the data block. */

#define WORK_SIZE_CHECK (COMPILE_WORK_SIZE - 100)
@@ -268,10 +268,10 @@
it is now one long string. We cannot use a table of offsets, because the
lengths of inserts such as XSTRING(MAX_NAME_SIZE) are not known. Instead, we
simply count through to the one we want - this isn't a performance issue
-because these strings are used only when there is a compilation error.
+because these strings are used only when there is a compilation error.

-Each substring ends with \0 to insert a null character. This includes the final
-substring, so that the whole string ends with \0\0, which can be detected when
+Each substring ends with \0 to insert a null character. This includes the final
+substring, so that the whole string ends with \0\0, which can be detected when
counting through. */

static const char error_texts[] =
@@ -511,11 +511,11 @@
find_error_text(int n)
{
const char *s = error_texts;
-for (; n > 0; n--)
+for (; n > 0; n--)
{
while (*s++ != 0) {};
if (*s == 0) return "Error text not found (please report)";
- }
+ }
return s;
}

@@ -1807,7 +1807,7 @@
const uschar *ccode;

c = *code;
-
+
/* Skip over forward assertions; the other assertions are skipped by
first_significant_code() with a TRUE final argument. */

@@ -1827,13 +1827,13 @@
     c = *code;
     continue;
     }
-    
+
   /* For a recursion/subroutine call, if its end has been reached, which
   implies a subroutine call, we can scan it. */
-  
+
   if (c == OP_RECURSE)
     {
-    BOOL empty_branch = FALSE; 
+    BOOL empty_branch = FALSE;
     const uschar *scode = cd->start_code + GET(code, 1);
     if (GET(scode, 1) == 0) return TRUE;    /* Unclosed */
     do
@@ -1841,14 +1841,14 @@
       if (could_be_empty_branch(scode, endcode, utf8, cd))
         {
         empty_branch = TRUE;
-        break;  
-        }  
+        break;
+        }
       scode += GET(scode, 1);
       }
     while (*scode == OP_ALT);
     if (!empty_branch) return FALSE;  /* All branches are non-empty */
     continue;
-    }   
+    }


/* For other groups, scan the branches. */

@@ -2004,9 +2004,9 @@
#endif

     /* None of the remaining opcodes are required to match a character. */
-     
+
     default:
-    break;  
+    break;
     }
   }


@@ -2029,7 +2029,7 @@
   endcode     points to where to stop (current RECURSE item)
   bcptr       points to the chain of current (unclosed) branch starts
   utf8        TRUE if in UTF-8 mode
-  cd          pointers to tables etc 
+  cd          pointers to tables etc


 Returns:      TRUE if what is matched could be empty
 */
@@ -4475,7 +4475,7 @@


         /* Because we are moving code along, we must ensure that any
         pending recursive references are updated. */
-          
+
         default:
         *code = OP_END;
         adjust_recurse(tempcode, 1 + LINK_SIZE, utf8, cd, save_hwm);
@@ -5197,11 +5197,11 @@
                 *errorcodeptr = ERR15;
                 goto FAILED;
                 }
-                 
+
               /* Fudge the value of "called" so that when it is inserted as an
-              offset below, what it actually inserted is the reference number 
+              offset below, what it actually inserted is the reference number
               of the group. */
-                
+
               called = cd->start_code + recno;
               PUTINC(cd->hwm, 0, code + 2 + LINK_SIZE - cd->start_code);
               }


Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/pcre_dfa_exec.c    2010-03-10 16:08:01 UTC (rev 507)
@@ -714,12 +714,12 @@
       opcode, are not the correct length. It seems to be the only way to do
       such a check at compile time, as the sizeof() operator does not work
       in the C preprocessor. */
-      
+
       case OP_TABLE_LENGTH:
-      case OP_TABLE_LENGTH + 
+      case OP_TABLE_LENGTH +
         ((sizeof(coptable) == OP_TABLE_LENGTH) &&
          (sizeof(poptable) == OP_TABLE_LENGTH)):
-      break;           
+      break;


 /* ========================================================================== */
       /* Reached a closing bracket. If not at the end of the pattern, carry


Modified: code/trunk/pcre_globals.c
===================================================================
--- code/trunk/pcre_globals.c    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/pcre_globals.c    2010-03-10 16:08:01 UTC (rev 507)
@@ -43,10 +43,10 @@
 However, it calls memory allocation and freeing functions via the four
 indirections below, and it can optionally do callouts, using the fifth
 indirection. These values can be changed by the caller, but are shared between
-all threads. 
+all threads.


-For MS Visual Studio and Symbian OS, there are problems in initializing these
-variables to non-local functions. In these cases, therefore, an indirection via
+For MS Visual Studio and Symbian OS, there are problems in initializing these
+variables to non-local functions. In these cases, therefore, an indirection via
a local function is used.

Also, when compiling for Virtual Pascal, things are done differently, and

Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/pcre_internal.h    2010-03-10 16:08:01 UTC (rev 507)
@@ -1503,7 +1503,7 @@
 #define RREF_ANY  0xffff


/* Compile time error code numbers. They are given names so that they can more
-easily be tracked. When a new number is added, the table called eint in
+easily be tracked. When a new number is added, the table called eint in
pcreposix.c must be updated. */

enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,

Modified: code/trunk/pcre_tables.c
===================================================================
--- code/trunk/pcre_tables.c    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/pcre_tables.c    2010-03-10 16:08:01 UTC (rev 507)
@@ -249,7 +249,7 @@
 #define STRING_Zp0 STR_Z STR_p "\0"
 #define STRING_Zs0 STR_Z STR_s "\0"


-const char _pcre_utt_names[] =
+const char _pcre_utt_names[] =
STRING_Any0
STRING_Arabic0
STRING_Armenian0
@@ -382,138 +382,138 @@
STRING_Zp0
STRING_Zs0;

-const ucp_type_table _pcre_utt[] = {
- { 0, PT_ANY, 0 },
- { 4, PT_SC, ucp_Arabic },
- { 11, PT_SC, ucp_Armenian },
- { 20, PT_SC, ucp_Avestan },
- { 28, PT_SC, ucp_Balinese },
- { 37, PT_SC, ucp_Bamum },
- { 43, PT_SC, ucp_Bengali },
- { 51, PT_SC, ucp_Bopomofo },
- { 60, PT_SC, ucp_Braille },
- { 68, PT_SC, ucp_Buginese },
- { 77, PT_SC, ucp_Buhid },
- { 83, PT_GC, ucp_C },
- { 85, PT_SC, ucp_Canadian_Aboriginal },
- { 105, PT_SC, ucp_Carian },
- { 112, PT_PC, ucp_Cc },
- { 115, PT_PC, ucp_Cf },
- { 118, PT_SC, ucp_Cham },
- { 123, PT_SC, ucp_Cherokee },
- { 132, PT_PC, ucp_Cn },
- { 135, PT_PC, ucp_Co },
- { 138, PT_SC, ucp_Common },
- { 145, PT_SC, ucp_Coptic },
- { 152, PT_PC, ucp_Cs },
- { 155, PT_SC, ucp_Cuneiform },
- { 165, PT_SC, ucp_Cypriot },
- { 173, PT_SC, ucp_Cyrillic },
- { 182, PT_SC, ucp_Deseret },
- { 190, PT_SC, ucp_Devanagari },
- { 201, PT_SC, ucp_Egyptian_Hieroglyphs },
- { 222, PT_SC, ucp_Ethiopic },
- { 231, PT_SC, ucp_Georgian },
- { 240, PT_SC, ucp_Glagolitic },
- { 251, PT_SC, ucp_Gothic },
- { 258, PT_SC, ucp_Greek },
- { 264, PT_SC, ucp_Gujarati },
- { 273, PT_SC, ucp_Gurmukhi },
- { 282, PT_SC, ucp_Han },
- { 286, PT_SC, ucp_Hangul },
- { 293, PT_SC, ucp_Hanunoo },
- { 301, PT_SC, ucp_Hebrew },
- { 308, PT_SC, ucp_Hiragana },
- { 317, PT_SC, ucp_Imperial_Aramaic },
- { 334, PT_SC, ucp_Inherited },
- { 344, PT_SC, ucp_Inscriptional_Pahlavi },
- { 366, PT_SC, ucp_Inscriptional_Parthian },
- { 389, PT_SC, ucp_Javanese },
- { 398, PT_SC, ucp_Kaithi },
- { 405, PT_SC, ucp_Kannada },
- { 413, PT_SC, ucp_Katakana },
- { 422, PT_SC, ucp_Kayah_Li },
- { 431, PT_SC, ucp_Kharoshthi },
- { 442, PT_SC, ucp_Khmer },
- { 448, PT_GC, ucp_L },
- { 450, PT_LAMP, 0 },
- { 453, PT_SC, ucp_Lao },
- { 457, PT_SC, ucp_Latin },
- { 463, PT_SC, ucp_Lepcha },
- { 470, PT_SC, ucp_Limbu },
- { 476, PT_SC, ucp_Linear_B },
- { 485, PT_SC, ucp_Lisu },
- { 490, PT_PC, ucp_Ll },
- { 493, PT_PC, ucp_Lm },
- { 496, PT_PC, ucp_Lo },
- { 499, PT_PC, ucp_Lt },
- { 502, PT_PC, ucp_Lu },
- { 505, PT_SC, ucp_Lycian },
- { 512, PT_SC, ucp_Lydian },
- { 519, PT_GC, ucp_M },
- { 521, PT_SC, ucp_Malayalam },
- { 531, PT_PC, ucp_Mc },
- { 534, PT_PC, ucp_Me },
- { 537, PT_SC, ucp_Meetei_Mayek },
- { 550, PT_PC, ucp_Mn },
- { 553, PT_SC, ucp_Mongolian },
- { 563, PT_SC, ucp_Myanmar },
- { 571, PT_GC, ucp_N },
- { 573, PT_PC, ucp_Nd },
- { 576, PT_SC, ucp_New_Tai_Lue },
- { 588, PT_SC, ucp_Nko },
- { 592, PT_PC, ucp_Nl },
- { 595, PT_PC, ucp_No },
- { 598, PT_SC, ucp_Ogham },
- { 604, PT_SC, ucp_Ol_Chiki },
- { 613, PT_SC, ucp_Old_Italic },
- { 624, PT_SC, ucp_Old_Persian },
- { 636, PT_SC, ucp_Old_South_Arabian },
- { 654, PT_SC, ucp_Old_Turkic },
- { 665, PT_SC, ucp_Oriya },
- { 671, PT_SC, ucp_Osmanya },
- { 679, PT_GC, ucp_P },
- { 681, PT_PC, ucp_Pc },
- { 684, PT_PC, ucp_Pd },
- { 687, PT_PC, ucp_Pe },
- { 690, PT_PC, ucp_Pf },
- { 693, PT_SC, ucp_Phags_Pa },
- { 702, PT_SC, ucp_Phoenician },
- { 713, PT_PC, ucp_Pi },
- { 716, PT_PC, ucp_Po },
- { 719, PT_PC, ucp_Ps },
- { 722, PT_SC, ucp_Rejang },
- { 729, PT_SC, ucp_Runic },
- { 735, PT_GC, ucp_S },
- { 737, PT_SC, ucp_Samaritan },
- { 747, PT_SC, ucp_Saurashtra },
- { 758, PT_PC, ucp_Sc },
- { 761, PT_SC, ucp_Shavian },
- { 769, PT_SC, ucp_Sinhala },
- { 777, PT_PC, ucp_Sk },
- { 780, PT_PC, ucp_Sm },
- { 783, PT_PC, ucp_So },
- { 786, PT_SC, ucp_Sundanese },
- { 796, PT_SC, ucp_Syloti_Nagri },
- { 809, PT_SC, ucp_Syriac },
- { 816, PT_SC, ucp_Tagalog },
- { 824, PT_SC, ucp_Tagbanwa },
- { 833, PT_SC, ucp_Tai_Le },
- { 840, PT_SC, ucp_Tai_Tham },
- { 849, PT_SC, ucp_Tai_Viet },
- { 858, PT_SC, ucp_Tamil },
- { 864, PT_SC, ucp_Telugu },
- { 871, PT_SC, ucp_Thaana },
- { 878, PT_SC, ucp_Thai },
- { 883, PT_SC, ucp_Tibetan },
- { 891, PT_SC, ucp_Tifinagh },
- { 900, PT_SC, ucp_Ugaritic },
- { 909, PT_SC, ucp_Vai },
- { 913, PT_SC, ucp_Yi },
- { 916, PT_GC, ucp_Z },
- { 918, PT_PC, ucp_Zl },
- { 921, PT_PC, ucp_Zp },
- { 924, PT_PC, ucp_Zs }
+const ucp_type_table _pcre_utt[] = {
+ { 0, PT_ANY, 0 },
+ { 4, PT_SC, ucp_Arabic },
+ { 11, PT_SC, ucp_Armenian },
+ { 20, PT_SC, ucp_Avestan },
+ { 28, PT_SC, ucp_Balinese },
+ { 37, PT_SC, ucp_Bamum },
+ { 43, PT_SC, ucp_Bengali },
+ { 51, PT_SC, ucp_Bopomofo },
+ { 60, PT_SC, ucp_Braille },
+ { 68, PT_SC, ucp_Buginese },
+ { 77, PT_SC, ucp_Buhid },
+ { 83, PT_GC, ucp_C },
+ { 85, PT_SC, ucp_Canadian_Aboriginal },
+ { 105, PT_SC, ucp_Carian },
+ { 112, PT_PC, ucp_Cc },
+ { 115, PT_PC, ucp_Cf },
+ { 118, PT_SC, ucp_Cham },
+ { 123, PT_SC, ucp_Cherokee },
+ { 132, PT_PC, ucp_Cn },
+ { 135, PT_PC, ucp_Co },
+ { 138, PT_SC, ucp_Common },
+ { 145, PT_SC, ucp_Coptic },
+ { 152, PT_PC, ucp_Cs },
+ { 155, PT_SC, ucp_Cuneiform },
+ { 165, PT_SC, ucp_Cypriot },
+ { 173, PT_SC, ucp_Cyrillic },
+ { 182, PT_SC, ucp_Deseret },
+ { 190, PT_SC, ucp_Devanagari },
+ { 201, PT_SC, ucp_Egyptian_Hieroglyphs },
+ { 222, PT_SC, ucp_Ethiopic },
+ { 231, PT_SC, ucp_Georgian },
+ { 240, PT_SC, ucp_Glagolitic },
+ { 251, PT_SC, ucp_Gothic },
+ { 258, PT_SC, ucp_Greek },
+ { 264, PT_SC, ucp_Gujarati },
+ { 273, PT_SC, ucp_Gurmukhi },
+ { 282, PT_SC, ucp_Han },
+ { 286, PT_SC, ucp_Hangul },
+ { 293, PT_SC, ucp_Hanunoo },
+ { 301, PT_SC, ucp_Hebrew },
+ { 308, PT_SC, ucp_Hiragana },
+ { 317, PT_SC, ucp_Imperial_Aramaic },
+ { 334, PT_SC, ucp_Inherited },
+ { 344, PT_SC, ucp_Inscriptional_Pahlavi },
+ { 366, PT_SC, ucp_Inscriptional_Parthian },
+ { 389, PT_SC, ucp_Javanese },
+ { 398, PT_SC, ucp_Kaithi },
+ { 405, PT_SC, ucp_Kannada },
+ { 413, PT_SC, ucp_Katakana },
+ { 422, PT_SC, ucp_Kayah_Li },
+ { 431, PT_SC, ucp_Kharoshthi },
+ { 442, PT_SC, ucp_Khmer },
+ { 448, PT_GC, ucp_L },
+ { 450, PT_LAMP, 0 },
+ { 453, PT_SC, ucp_Lao },
+ { 457, PT_SC, ucp_Latin },
+ { 463, PT_SC, ucp_Lepcha },
+ { 470, PT_SC, ucp_Limbu },
+ { 476, PT_SC, ucp_Linear_B },
+ { 485, PT_SC, ucp_Lisu },
+ { 490, PT_PC, ucp_Ll },
+ { 493, PT_PC, ucp_Lm },
+ { 496, PT_PC, ucp_Lo },
+ { 499, PT_PC, ucp_Lt },
+ { 502, PT_PC, ucp_Lu },
+ { 505, PT_SC, ucp_Lycian },
+ { 512, PT_SC, ucp_Lydian },
+ { 519, PT_GC, ucp_M },
+ { 521, PT_SC, ucp_Malayalam },
+ { 531, PT_PC, ucp_Mc },
+ { 534, PT_PC, ucp_Me },
+ { 537, PT_SC, ucp_Meetei_Mayek },
+ { 550, PT_PC, ucp_Mn },
+ { 553, PT_SC, ucp_Mongolian },
+ { 563, PT_SC, ucp_Myanmar },
+ { 571, PT_GC, ucp_N },
+ { 573, PT_PC, ucp_Nd },
+ { 576, PT_SC, ucp_New_Tai_Lue },
+ { 588, PT_SC, ucp_Nko },
+ { 592, PT_PC, ucp_Nl },
+ { 595, PT_PC, ucp_No },
+ { 598, PT_SC, ucp_Ogham },
+ { 604, PT_SC, ucp_Ol_Chiki },
+ { 613, PT_SC, ucp_Old_Italic },
+ { 624, PT_SC, ucp_Old_Persian },
+ { 636, PT_SC, ucp_Old_South_Arabian },
+ { 654, PT_SC, ucp_Old_Turkic },
+ { 665, PT_SC, ucp_Oriya },
+ { 671, PT_SC, ucp_Osmanya },
+ { 679, PT_GC, ucp_P },
+ { 681, PT_PC, ucp_Pc },
+ { 684, PT_PC, ucp_Pd },
+ { 687, PT_PC, ucp_Pe },
+ { 690, PT_PC, ucp_Pf },
+ { 693, PT_SC, ucp_Phags_Pa },
+ { 702, PT_SC, ucp_Phoenician },
+ { 713, PT_PC, ucp_Pi },
+ { 716, PT_PC, ucp_Po },
+ { 719, PT_PC, ucp_Ps },
+ { 722, PT_SC, ucp_Rejang },
+ { 729, PT_SC, ucp_Runic },
+ { 735, PT_GC, ucp_S },
+ { 737, PT_SC, ucp_Samaritan },
+ { 747, PT_SC, ucp_Saurashtra },
+ { 758, PT_PC, ucp_Sc },
+ { 761, PT_SC, ucp_Shavian },
+ { 769, PT_SC, ucp_Sinhala },
+ { 777, PT_PC, ucp_Sk },
+ { 780, PT_PC, ucp_Sm },
+ { 783, PT_PC, ucp_So },
+ { 786, PT_SC, ucp_Sundanese },
+ { 796, PT_SC, ucp_Syloti_Nagri },
+ { 809, PT_SC, ucp_Syriac },
+ { 816, PT_SC, ucp_Tagalog },
+ { 824, PT_SC, ucp_Tagbanwa },
+ { 833, PT_SC, ucp_Tai_Le },
+ { 840, PT_SC, ucp_Tai_Tham },
+ { 849, PT_SC, ucp_Tai_Viet },
+ { 858, PT_SC, ucp_Tamil },
+ { 864, PT_SC, ucp_Telugu },
+ { 871, PT_SC, ucp_Thaana },
+ { 878, PT_SC, ucp_Thai },
+ { 883, PT_SC, ucp_Tibetan },
+ { 891, PT_SC, ucp_Tifinagh },
+ { 900, PT_SC, ucp_Ugaritic },
+ { 909, PT_SC, ucp_Vai },
+ { 913, PT_SC, ucp_Yi },
+ { 916, PT_GC, ucp_Z },
+ { 918, PT_PC, ucp_Zl },
+ { 921, PT_PC, ucp_Zp },
+ { 924, PT_PC, ucp_Zs }
};

const int _pcre_utt_size = sizeof(_pcre_utt)/sizeof(ucp_type_table);

Modified: code/trunk/pcreposix.c
===================================================================
--- code/trunk/pcreposix.c    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/pcreposix.c    2010-03-10 16:08:01 UTC (rev 507)
@@ -372,7 +372,7 @@
   error if the vector eint, which is indexed by compile-time error number, is
   not the correct length. It seems to be the only way to do such a check at
   compile time, as the sizeof() operator does not work in the C preprocessor.
-  As all the PCRE_ERROR_xxx values are negative, we can use 0 and 1. */ 
+  As all the PCRE_ERROR_xxx values are negative, we can use 0 and 1. */


case 0:
case (sizeof(eint)/sizeof(int) == ERRCOUNT):

Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/pcretest.c    2010-03-10 16:08:01 UTC (rev 507)
@@ -118,7 +118,7 @@


/* We also need the pcre_printint() function for printing out compiled
patterns. This function is in a separate file so that it can be included in
-pcre_compile.c when that module is compiled with debugging enabled. It needs to
+pcre_compile.c when that module is compiled with debugging enabled. It needs to
know which case is being compiled. */

#define COMPILING_PCRETEST

Modified: code/trunk/testdata/testinput12
===================================================================
--- code/trunk/testdata/testinput12    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/testdata/testinput12    2010-03-10 16:08:01 UTC (rev 507)
@@ -202,5 +202,14 @@
 /(?i:[\x{c0}])/8
     \x{c0}
     \x{e0} 
+
+/-- This should be Perl-compatible but Perl 5.11 gets \x{300} wrong. --/8


+/^\X/8
+    A
+    A\x{300}BC 
+    A\x{300}\x{301}\x{302}BC 
+    *** Failers
+    \x{300}  
+
 /-- End of testinput12 --/


Modified: code/trunk/testdata/testinput6
===================================================================
--- code/trunk/testdata/testinput6    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/testdata/testinput6    2010-03-10 16:08:01 UTC (rev 507)
@@ -370,13 +370,6 @@
     \x{3b1}
     \x{ff5a}   


-/^\X/8
-    A
-    A\x{300}BC 
-    A\x{300}\x{301}\x{302}BC 
-    *** Failers
-    \x{300}  
-
 /^[\X]/8
     X123
     *** Failers


Modified: code/trunk/testdata/testoutput12
===================================================================
--- code/trunk/testdata/testoutput12    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/testdata/testoutput12    2010-03-10 16:08:01 UTC (rev 507)
@@ -470,5 +470,19 @@
  0: \x{c0}
     \x{e0} 
  0: \x{e0}
+
+/-- This should be Perl-compatible but Perl 5.11 gets \x{300} wrong. --/8


+/^\X/8
+    A
+ 0: A
+    A\x{300}BC 
+ 0: A\x{300}
+    A\x{300}\x{301}\x{302}BC 
+ 0: A\x{300}\x{301}\x{302}
+    *** Failers
+ 0: *
+    \x{300}  
+No match
+
 /-- End of testinput12 --/


Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6    2010-03-09 17:01:40 UTC (rev 506)
+++ code/trunk/testdata/testoutput6    2010-03-10 16:08:01 UTC (rev 507)
@@ -618,18 +618,6 @@
     \x{ff5a}   
  0: \x{ff5a}


-/^\X/8
-    A
- 0: A
-    A\x{300}BC 
- 0: A\x{300}
-    A\x{300}\x{301}\x{302}BC 
- 0: A\x{300}\x{301}\x{302}
-    *** Failers
- 0: *
-    \x{300}  
-No match
-
 /^[\X]/8
     X123
  0: X