[Pcre-svn] [738] code/trunk: Source tidies for 8.20 release.

Startseite
Nachricht löschen
Autor: Subversion repository
Datum:  
To: pcre-svn
Betreff: [Pcre-svn] [738] code/trunk: Source tidies for 8.20 release.
Revision: 738
          http://vcs.pcre.org/viewvc?view=rev&revision=738
Author:   ph10
Date:     2011-10-21 10:04:01 +0100 (Fri, 21 Oct 2011)


Log Message:
-----------
Source tidies for 8.20 release.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/NEWS
    code/trunk/configure.ac
    code/trunk/doc/html/pcrejit.html
    code/trunk/doc/html/pcrepattern.html
    code/trunk/doc/html/pcreunicode.html
    code/trunk/doc/pcre.txt
    code/trunk/doc/pcrepattern.3
    code/trunk/pcretest.c


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2011-10-19 17:37:29 UTC (rev 737)
+++ code/trunk/ChangeLog    2011-10-21 09:04:01 UTC (rev 738)
@@ -1,7 +1,7 @@
 ChangeLog for PCRE
 ------------------


-Version 8.20 10-Oct-2011
+Version 8.20 21-Oct-2011
------------------------

1. Change 37 of 8.13 broke patterns like [:a]...[b:] because it thought it had
@@ -68,7 +68,7 @@

 12. In some environments, the output of pcretest -C is CRLF terminated. This
     broke RunTest's code that checks for the link size. A single white space
-    after the value is now allowed for.
+    character after the value is now allowed for.


 13. RunTest now checks for the "fr" locale as well as for "fr_FR" and "french".
     For "fr", it uses the Windows-specific input and output files.
@@ -103,16 +103,16 @@
 19. If the PCRE_NO_START_OPTIMIZE option was set for pcre_compile(), it did not
     suppress the check for a minimum subject length at run time. (If it was
     given to pcre_exec() or pcre_dfa_exec() it did work.)
-    
+
 20. Fixed an ASCII-dependent infelicity in pcretest that would have made it
-    fail to work when decoding hex characters in data strings in EBCDIC 
-    environments. 
-    
-21. It appears that in at least one Mac OS environment, the isxdigit() function 
+    fail to work when decoding hex characters in data strings in EBCDIC
+    environments.
+
+21. It appears that in at least one Mac OS environment, the isxdigit() function
     is implemented as a macro that evaluates to its argument more than once,
     contravening the C 90 Standard (I haven't checked a later standard). There
     was an instance in pcretest which caused it to go wrong when processing
-    \x{...} escapes in subject strings. The has been rewritten to avoid using 
+    \x{...} escapes in subject strings. The has been rewritten to avoid using
     things like p++ in the argument of isxdigit().




Modified: code/trunk/NEWS
===================================================================
--- code/trunk/NEWS    2011-10-19 17:37:29 UTC (rev 737)
+++ code/trunk/NEWS    2011-10-21 09:04:01 UTC (rev 738)
@@ -1,14 +1,14 @@
 News about PCRE releases
 ------------------------


-Release 8.20
-------------
+Release 8.20 21-Oct-2011
+------------------------

The main change in this release is the inclusion of Zoltan Herczeg's
just-in-time compiler support, which can be accessed by building PCRE with
--enable-jit. Large performance benefits can be had in many situations. 8.20
also fixes an unfortunate bug that was introduced in 8.13 as well as tidying up
-a couple of infelicities.
+a number of infelicities and differences from Perl.


Release 8.13 16-Aug-2011

Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2011-10-19 17:37:29 UTC (rev 737)
+++ code/trunk/configure.ac    2011-10-21 09:04:01 UTC (rev 738)
@@ -10,8 +10,8 @@


m4_define(pcre_major, [8])
m4_define(pcre_minor, [20])
-m4_define(pcre_prerelease, [-RC3])
-m4_define(pcre_date, [2011-10-10])
+m4_define(pcre_prerelease, [])
+m4_define(pcre_date, [2011-10-21])

# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [0:1:0])

Modified: code/trunk/doc/html/pcrejit.html
===================================================================
--- code/trunk/doc/html/pcrejit.html    2011-10-19 17:37:29 UTC (rev 737)
+++ code/trunk/doc/html/pcrejit.html    2011-10-21 09:04:01 UTC (rev 738)
@@ -116,7 +116,7 @@
 <P>
 The unsupported pattern items are:
 <pre>
-  \C            match a single byte, even in UTF-8 mode
+  \C            match a single byte; not supported in UTF-8 mode
   (?Cn)          callouts
   (?(&#60;name&#62;)...  conditional test on setting of a named subpattern
   (?(R)...       conditional test on whole pattern recursion
@@ -275,7 +275,7 @@
 </P>
 <br><a name="SEC11" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 05 October 2011
+Last updated: 19 October 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrepattern.html
===================================================================
--- code/trunk/doc/html/pcrepattern.html    2011-10-19 17:37:29 UTC (rev 737)
+++ code/trunk/doc/html/pcrepattern.html    2011-10-21 09:04:01 UTC (rev 738)
@@ -968,17 +968,39 @@
 <br><a name="SEC7" href="#TOC1">MATCHING A SINGLE BYTE</a><br>
 <P>
 Outside a character class, the escape sequence \C matches any one byte, both
-in and out of UTF-8 mode. Unlike a dot, it always matches any line-ending
+in and out of UTF-8 mode. Unlike a dot, it always matches line-ending
 characters. The feature is provided in Perl in order to match individual bytes
-in UTF-8 mode. Because it breaks up UTF-8 characters into individual bytes, the
-rest of the string may start with a malformed UTF-8 character. For this reason,
-the \C escape sequence is best avoided.
+in UTF-8 mode, but it is unclear how it can usefully be used. Because \C
+breaks up characters into individual bytes, matching one byte with \C in UTF-8
+mode means that the rest of the string may start with a malformed UTF-8
+character. This has undefined results, because PCRE assumes that it is dealing
+with valid UTF-8 strings (and by default it checks this at the start of
+processing unless the PCRE_NO_UTF8_CHECK option is used).
 </P>
 <P>
 PCRE does not allow \C to appear in lookbehind assertions
 <a href="#lookbehind">(described below),</a>
 because in UTF-8 mode this would make it impossible to calculate the length of
 the lookbehind.
+</P>
+<P>
+In general, the \C escape sequence is best avoided in UTF-8 mode. However, one
+way of using it that avoids the problem of malformed UTF-8 characters is to
+use a lookahead to check the length of the next character, as in this pattern
+(ignore white space and line breaks):
+<pre>
+  (?| (?=[\x00-\x7f])(\C) |
+      (?=[\x80-\x{7ff}])(\C)(\C) |
+      (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
+      (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))
+</pre>
+A group that starts with (?| resets the capturing parentheses numbers in each
+alternative (see
+<a href="#dupsubpatternnumber">"Duplicate Subpattern Numbers"</a>
+below). The assertions at the start of each branch check the next UTF-8
+character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The
+character's individual bytes are then captured by the appropriate number of
+groups.
 <a name="characterclass"></a></P>
 <br><a name="SEC8" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>
 <P>
@@ -2797,7 +2819,7 @@
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 09 October 2011
+Last updated: 19 October 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcreunicode.html
===================================================================
--- code/trunk/doc/html/pcreunicode.html    2011-10-19 17:37:29 UTC (rev 737)
+++ code/trunk/doc/html/pcreunicode.html    2011-10-21 09:04:01 UTC (rev 738)
@@ -118,11 +118,14 @@
 </P>
 <P>
 5. The escape sequence \C can be used to match a single byte in UTF-8 mode,
-but its use can lead to some strange effects. This facility is not available in
-the alternative matching function, <b>pcre_dfa_exec()</b>, nor is it supported
-by the JIT optimization of <b>pcre_exec()</b>. If JIT optimization is requested
-for a pattern that contains \C, it will not succeed, and so the matching will
-be carried out by the normal interpretive function.
+but its use can lead to some strange effects because it breaks up multibyte
+characters (see the description of \C in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation). The use of \C is not supported in the alternative matching
+function <b>pcre_dfa_exec()</b>, nor is it supported in UTF-8 mode by the JIT
+optimization of <b>pcre_exec()</b>. If JIT optimization is requested for a UTF-8
+pattern that contains \C, it will not succeed, and so the matching will be
+carried out by the normal interpretive function.
 </P>
 <P>
 6. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
@@ -175,7 +178,7 @@
 REVISION
 </b><br>
 <P>
-Last updated: 06 September 2011
+Last updated: 19 October 2011
 <br>
 Copyright &copy; 1997-2011 University of Cambridge.
 <br>


Modified: code/trunk/doc/pcre.txt
===================================================================
--- code/trunk/doc/pcre.txt    2011-10-19 17:37:29 UTC (rev 737)
+++ code/trunk/doc/pcre.txt    2011-10-21 09:04:01 UTC (rev 738)
@@ -4142,127 +4142,147 @@
 MATCHING A SINGLE BYTE


        Outside a character class, the escape sequence \C matches any one byte,
-       both  in  and  out  of  UTF-8 mode. Unlike a dot, it always matches any
-       line-ending characters. The feature is provided in  Perl  in  order  to
-       match  individual bytes in UTF-8 mode. Because it breaks up UTF-8 char-
-       acters into individual bytes, the rest of the string may start  with  a
-       malformed  UTF-8  character. For this reason, the \C escape sequence is
-       best avoided.
+       both  in  and  out of UTF-8 mode. Unlike a dot, it always matches line-
+       ending characters. The feature is provided in Perl in  order  to  match
+       individual  bytes  in UTF-8 mode, but it is unclear how it can usefully
+       be used. Because \C breaks up characters into individual bytes,  match-
+       ing  one  byte  with \C in UTF-8 mode means that the rest of the string
+       may start with a malformed UTF-8 character. This has undefined results,
+       because  PCRE  assumes that it is dealing with valid UTF-8 strings (and
+       by default it checks  this  at  the  start  of  processing  unless  the
+       PCRE_NO_UTF8_CHECK option is used).


-       PCRE does not allow \C to appear in  lookbehind  assertions  (described
-       below),  because  in UTF-8 mode this would make it impossible to calcu-
+       PCRE  does  not  allow \C to appear in lookbehind assertions (described
+       below), because in UTF-8 mode this would make it impossible  to  calcu-
        late the length of the lookbehind.


+       In  general, the \C escape sequence is best avoided in UTF-8 mode. How-
+       ever, one way of using it that avoids the problem  of  malformed  UTF-8
+       characters  is to use a lookahead to check the length of the next char-
+       acter, as in this pattern (ignore white space and line breaks):


+         (?| (?=[\x00-\x7f])(\C) |
+             (?=[\x80-\x{7ff}])(\C)(\C) |
+             (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
+             (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))
+
+       A group that starts with (?| resets the capturing  parentheses  numbers
+       in  each  alternative  (see  "Duplicate Subpattern Numbers" below). The
+       assertions at the start of each branch check the next  UTF-8  character
+       for  values  whose encoding uses 1, 2, 3, or 4 bytes, respectively. The
+       character's individual bytes are then captured by the appropriate  num-
+       ber of groups.
+
+
 SQUARE BRACKETS AND CHARACTER CLASSES


        An opening square bracket introduces a character class, terminated by a
        closing square bracket. A closing square bracket on its own is not spe-
        cial by default.  However, if the PCRE_JAVASCRIPT_COMPAT option is set,
        a lone closing square bracket causes a compile-time error. If a closing
-       square bracket is required as a member of the class, it should  be  the
-       first  data  character  in  the  class (after an initial circumflex, if
+       square  bracket  is required as a member of the class, it should be the
+       first data character in the class  (after  an  initial  circumflex,  if
        present) or escaped with a backslash.


-       A character class matches a single character in the subject.  In  UTF-8
+       A  character  class matches a single character in the subject. In UTF-8
        mode, the character may be more than one byte long. A matched character
        must be in the set of characters defined by the class, unless the first
-       character  in  the  class definition is a circumflex, in which case the
-       subject character must not be in the set defined by  the  class.  If  a
-       circumflex  is actually required as a member of the class, ensure it is
+       character in the class definition is a circumflex, in  which  case  the
+       subject  character  must  not  be in the set defined by the class. If a
+       circumflex is actually required as a member of the class, ensure it  is
        not the first character, or escape it with a backslash.


-       For example, the character class [aeiou] matches any lower case  vowel,
-       while  [^aeiou]  matches  any character that is not a lower case vowel.
+       For  example, the character class [aeiou] matches any lower case vowel,
+       while [^aeiou] matches any character that is not a  lower  case  vowel.
        Note that a circumflex is just a convenient notation for specifying the
-       characters  that  are in the class by enumerating those that are not. A
-       class that starts with a circumflex is not an assertion; it still  con-
-       sumes  a  character  from the subject string, and therefore it fails if
+       characters that are in the class by enumerating those that are  not.  A
+       class  that starts with a circumflex is not an assertion; it still con-
+       sumes a character from the subject string, and therefore  it  fails  if
        the current pointer is at the end of the string.


-       In UTF-8 mode, characters with values greater than 255 can be  included
-       in  a  class as a literal string of bytes, or by using the \x{ escaping
+       In  UTF-8 mode, characters with values greater than 255 can be included
+       in a class as a literal string of bytes, or by using the  \x{  escaping
        mechanism.


-       When caseless matching is set, any letters in a  class  represent  both
-       their  upper  case  and lower case versions, so for example, a caseless
-       [aeiou] matches "A" as well as "a", and a caseless  [^aeiou]  does  not
-       match  "A", whereas a caseful version would. In UTF-8 mode, PCRE always
-       understands the concept of case for characters whose  values  are  less
-       than  128, so caseless matching is always possible. For characters with
-       higher values, the concept of case is supported  if  PCRE  is  compiled
-       with  Unicode  property support, but not otherwise.  If you want to use
-       caseless matching in UTF8-mode for characters 128 and above,  you  must
-       ensure  that  PCRE is compiled with Unicode property support as well as
+       When  caseless  matching  is set, any letters in a class represent both
+       their upper case and lower case versions, so for  example,  a  caseless
+       [aeiou]  matches  "A"  as well as "a", and a caseless [^aeiou] does not
+       match "A", whereas a caseful version would. In UTF-8 mode, PCRE  always
+       understands  the  concept  of case for characters whose values are less
+       than 128, so caseless matching is always possible. For characters  with
+       higher  values,  the  concept  of case is supported if PCRE is compiled
+       with Unicode property support, but not otherwise.  If you want  to  use
+       caseless  matching  in UTF8-mode for characters 128 and above, you must
+       ensure that PCRE is compiled with Unicode property support as  well  as
        with UTF-8 support.


-       Characters that might indicate line breaks are  never  treated  in  any
-       special  way  when  matching  character  classes,  whatever line-ending
-       sequence is in  use,  and  whatever  setting  of  the  PCRE_DOTALL  and
+       Characters  that  might  indicate  line breaks are never treated in any
+       special way  when  matching  character  classes,  whatever  line-ending
+       sequence  is  in  use,  and  whatever  setting  of  the PCRE_DOTALL and
        PCRE_MULTILINE options is used. A class such as [^a] always matches one
        of these characters.


-       The minus (hyphen) character can be used to specify a range of  charac-
-       ters  in  a  character  class.  For  example,  [d-m] matches any letter
-       between d and m, inclusive. If a  minus  character  is  required  in  a
-       class,  it  must  be  escaped  with a backslash or appear in a position
-       where it cannot be interpreted as indicating a range, typically as  the
+       The  minus (hyphen) character can be used to specify a range of charac-
+       ters in a character  class.  For  example,  [d-m]  matches  any  letter
+       between  d  and  m,  inclusive.  If  a minus character is required in a
+       class, it must be escaped with a backslash  or  appear  in  a  position
+       where  it cannot be interpreted as indicating a range, typically as the
        first or last character in the class.


        It is not possible to have the literal character "]" as the end charac-
-       ter of a range. A pattern such as [W-]46] is interpreted as a class  of
-       two  characters ("W" and "-") followed by a literal string "46]", so it
-       would match "W46]" or "-46]". However, if the "]"  is  escaped  with  a
-       backslash  it is interpreted as the end of range, so [W-\]46] is inter-
-       preted as a class containing a range followed by two other  characters.
-       The  octal or hexadecimal representation of "]" can also be used to end
+       ter  of a range. A pattern such as [W-]46] is interpreted as a class of
+       two characters ("W" and "-") followed by a literal string "46]", so  it
+       would  match  "W46]"  or  "-46]". However, if the "]" is escaped with a
+       backslash it is interpreted as the end of range, so [W-\]46] is  inter-
+       preted  as a class containing a range followed by two other characters.
+       The octal or hexadecimal representation of "]" can also be used to  end
        a range.


-       Ranges operate in the collating sequence of character values. They  can
-       also   be  used  for  characters  specified  numerically,  for  example
-       [\000-\037]. In UTF-8 mode, ranges can include characters whose  values
+       Ranges  operate in the collating sequence of character values. They can
+       also  be  used  for  characters  specified  numerically,  for   example
+       [\000-\037].  In UTF-8 mode, ranges can include characters whose values
        are greater than 255, for example [\x{100}-\x{2ff}].


        If a range that includes letters is used when caseless matching is set,
        it matches the letters in either case. For example, [W-c] is equivalent
-       to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if
-       character tables for a French locale are in  use,  [\xc8-\xcb]  matches
-       accented  E  characters in both cases. In UTF-8 mode, PCRE supports the
-       concept of case for characters with values greater than 128  only  when
+       to [][\\^_`wxyzabc], matched caselessly,  and  in  non-UTF-8  mode,  if
+       character  tables  for  a French locale are in use, [\xc8-\xcb] matches
+       accented E characters in both cases. In UTF-8 mode, PCRE  supports  the
+       concept  of  case for characters with values greater than 128 only when
        it is compiled with Unicode property support.


-       The  character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, \V,
+       The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,  \V,
        \w, and \W may appear in a character class, and add the characters that
-       they  match to the class. For example, [\dABCDEF] matches any hexadeci-
-       mal digit. In UTF-8 mode, the PCRE_UCP option affects the  meanings  of
-       \d,  \s,  \w  and  their upper case partners, just as it does when they
-       appear outside a character class, as described in the section  entitled
+       they match to the class. For example, [\dABCDEF] matches any  hexadeci-
+       mal  digit.  In UTF-8 mode, the PCRE_UCP option affects the meanings of
+       \d, \s, \w and their upper case partners, just as  it  does  when  they
+       appear  outside a character class, as described in the section entitled
        "Generic character types" above. The escape sequence \b has a different
-       meaning inside a character class; it matches the  backspace  character.
-       The  sequences  \B,  \N,  \R, and \X are not special inside a character
-       class. Like any other unrecognized escape sequences, they  are  treated
-       as  the literal characters "B", "N", "R", and "X" by default, but cause
+       meaning  inside  a character class; it matches the backspace character.
+       The sequences \B, \N, \R, and \X are not  special  inside  a  character
+       class.  Like  any other unrecognized escape sequences, they are treated
+       as the literal characters "B", "N", "R", and "X" by default, but  cause
        an error if the PCRE_EXTRA option is set.


-       A circumflex can conveniently be used with  the  upper  case  character
-       types  to specify a more restricted set of characters than the matching
-       lower case type.  For example, the class [^\W_] matches any  letter  or
+       A  circumflex  can  conveniently  be used with the upper case character
+       types to specify a more restricted set of characters than the  matching
+       lower  case  type.  For example, the class [^\W_] matches any letter or
        digit, but not underscore, whereas [\w] includes underscore. A positive
        character class should be read as "something OR something OR ..." and a
        negative class as "NOT something AND NOT something AND NOT ...".


-       The  only  metacharacters  that are recognized in character classes are
-       backslash, hyphen (only where it can be  interpreted  as  specifying  a
-       range),  circumflex  (only  at the start), opening square bracket (only
-       when it can be interpreted as introducing a POSIX class name - see  the
-       next  section),  and  the  terminating closing square bracket. However,
+       The only metacharacters that are recognized in  character  classes  are
+       backslash,  hyphen  (only  where  it can be interpreted as specifying a
+       range), circumflex (only at the start), opening  square  bracket  (only
+       when  it can be interpreted as introducing a POSIX class name - see the
+       next section), and the terminating  closing  square  bracket.  However,
        escaping other non-alphanumeric characters does no harm.



POSIX CHARACTER CLASSES

        Perl supports the POSIX notation for character classes. This uses names
-       enclosed  by  [: and :] within the enclosing square brackets. PCRE also
+       enclosed by [: and :] within the enclosing square brackets.  PCRE  also
        supports this notation. For example,


          [01[:alpha:]%]
@@ -4285,24 +4305,24 @@
          word     "word" characters (same as \w)
          xdigit   hexadecimal digits


-       The  "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
-       and space (32). Notice that this list includes the VT  character  (code
+       The "space" characters are HT (9), LF (10), VT (11), FF (12), CR  (13),
+       and  space  (32). Notice that this list includes the VT character (code
        11). This makes "space" different to \s, which does not include VT (for
        Perl compatibility).


-       The name "word" is a Perl extension, and "blank"  is  a  GNU  extension
-       from  Perl  5.8. Another Perl extension is negation, which is indicated
+       The  name  "word"  is  a Perl extension, and "blank" is a GNU extension
+       from Perl 5.8. Another Perl extension is negation, which  is  indicated
        by a ^ character after the colon. For example,


          [12[:^digit:]]


-       matches "1", "2", or any non-digit. PCRE (and Perl) also recognize  the
+       matches  "1", "2", or any non-digit. PCRE (and Perl) also recognize the
        POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
        these are not supported, and an error is given if they are encountered.


-       By default, in UTF-8 mode, characters with values greater than  128  do
-       not  match any of the POSIX character classes. However, if the PCRE_UCP
-       option is passed to pcre_compile(), some of the classes are changed  so
+       By  default,  in UTF-8 mode, characters with values greater than 128 do
+       not match any of the POSIX character classes. However, if the  PCRE_UCP
+       option  is passed to pcre_compile(), some of the classes are changed so
        that Unicode character properties are used. This is achieved by replac-
        ing the POSIX classes by other sequences, as follows:


@@ -4315,31 +4335,31 @@
          [:upper:]  becomes  \p{Lu}
          [:word:]   becomes  \p{Xwd}


-       Negated versions, such as [:^alpha:] use \P instead of  \p.  The  other
+       Negated  versions,  such  as [:^alpha:] use \P instead of \p. The other
        POSIX classes are unchanged, and match only characters with code points
        less than 128.



VERTICAL BAR

-       Vertical bar characters are used to separate alternative patterns.  For
+       Vertical  bar characters are used to separate alternative patterns. For
        example, the pattern


          gilbert|sullivan


-       matches  either "gilbert" or "sullivan". Any number of alternatives may
-       appear, and an empty  alternative  is  permitted  (matching  the  empty
+       matches either "gilbert" or "sullivan". Any number of alternatives  may
+       appear,  and  an  empty  alternative  is  permitted (matching the empty
        string). The matching process tries each alternative in turn, from left
-       to right, and the first one that succeeds is used. If the  alternatives
-       are  within a subpattern (defined below), "succeeds" means matching the
+       to  right, and the first one that succeeds is used. If the alternatives
+       are within a subpattern (defined below), "succeeds" means matching  the
        rest of the main pattern as well as the alternative in the subpattern.



INTERNAL OPTION SETTING

-       The settings of the  PCRE_CASELESS,  PCRE_MULTILINE,  PCRE_DOTALL,  and
-       PCRE_EXTENDED  options  (which are Perl-compatible) can be changed from
-       within the pattern by  a  sequence  of  Perl  option  letters  enclosed
+       The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
+       PCRE_EXTENDED options (which are Perl-compatible) can be  changed  from
+       within  the  pattern  by  a  sequence  of  Perl option letters enclosed
        between "(?" and ")".  The option letters are


          i  for PCRE_CASELESS
@@ -4349,47 +4369,47 @@


        For example, (?im) sets caseless, multiline matching. It is also possi-
        ble to unset these options by preceding the letter with a hyphen, and a
-       combined  setting and unsetting such as (?im-sx), which sets PCRE_CASE-
-       LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and  PCRE_EXTENDED,
-       is  also  permitted.  If  a  letter  appears  both before and after the
+       combined setting and unsetting such as (?im-sx), which sets  PCRE_CASE-
+       LESS  and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
+       is also permitted. If a  letter  appears  both  before  and  after  the
        hyphen, the option is unset.


-       The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and  PCRE_EXTRA
-       can  be changed in the same way as the Perl-compatible options by using
+       The  PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA
+       can be changed in the same way as the Perl-compatible options by  using
        the characters J, U and X respectively.


-       When one of these option changes occurs at  top  level  (that  is,  not
-       inside  subpattern parentheses), the change applies to the remainder of
+       When  one  of  these  option  changes occurs at top level (that is, not
+       inside subpattern parentheses), the change applies to the remainder  of
        the pattern that follows. If the change is placed right at the start of
        a pattern, PCRE extracts it into the global options (and it will there-
        fore show up in data extracted by the pcre_fullinfo() function).


-       An option change within a subpattern (see below for  a  description  of
-       subpatterns)  affects only that part of the subpattern that follows it,
+       An  option  change  within a subpattern (see below for a description of
+       subpatterns) affects only that part of the subpattern that follows  it,
        so


          (a(?i)b)c


        matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
-       used).   By  this means, options can be made to have different settings
-       in different parts of the pattern. Any changes made in one  alternative
-       do  carry  on  into subsequent branches within the same subpattern. For
+       used).  By this means, options can be made to have  different  settings
+       in  different parts of the pattern. Any changes made in one alternative
+       do carry on into subsequent branches within the  same  subpattern.  For
        example,


          (a(?i)b|c)


-       matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the
-       first  branch  is  abandoned before the option setting. This is because
-       the effects of option settings happen at compile time. There  would  be
+       matches  "ab",  "aB",  "c",  and "C", even though when matching "C" the
+       first branch is abandoned before the option setting.  This  is  because
+       the  effects  of option settings happen at compile time. There would be
        some very weird behaviour otherwise.


-       Note:  There  are  other  PCRE-specific  options that can be set by the
-       application when the compile or match functions  are  called.  In  some
+       Note: There are other PCRE-specific options that  can  be  set  by  the
+       application  when  the  compile  or match functions are called. In some
        cases the pattern can contain special leading sequences such as (*CRLF)
-       to override what the application has set or what  has  been  defaulted.
-       Details  are  given  in the section entitled "Newline sequences" above.
-       There are also the (*UTF8) and (*UCP) leading  sequences  that  can  be
-       used  to  set  UTF-8 and Unicode property modes; they are equivalent to
+       to  override  what  the application has set or what has been defaulted.
+       Details are given in the section entitled  "Newline  sequences"  above.
+       There  are  also  the  (*UTF8) and (*UCP) leading sequences that can be
+       used to set UTF-8 and Unicode property modes; they  are  equivalent  to
        setting the PCRE_UTF8 and the PCRE_UCP options, respectively.



@@ -4402,15 +4422,15 @@

          cat(aract|erpillar|)


-       matches  "cataract",  "caterpillar", or "cat". Without the parentheses,
+       matches "cataract", "caterpillar", or "cat". Without  the  parentheses,
        it would match "cataract", "erpillar" or an empty string.


-       2. It sets up the subpattern as  a  capturing  subpattern.  This  means
-       that,  when  the  whole  pattern  matches,  that portion of the subject
+       2.  It  sets  up  the  subpattern as a capturing subpattern. This means
+       that, when the whole pattern  matches,  that  portion  of  the  subject
        string that matched the subpattern is passed back to the caller via the
-       ovector  argument  of pcre_exec(). Opening parentheses are counted from
-       left to right (starting from 1) to obtain  numbers  for  the  capturing
-       subpatterns.  For  example,  if  the  string  "the red king" is matched
+       ovector argument of pcre_exec(). Opening parentheses are  counted  from
+       left  to  right  (starting  from 1) to obtain numbers for the capturing
+       subpatterns. For example, if the  string  "the  red  king"  is  matched
        against the pattern


          the ((red|white) (king|queen))
@@ -4418,12 +4438,12 @@
        the captured substrings are "red king", "red", and "king", and are num-
        bered 1, 2, and 3, respectively.


-       The  fact  that  plain  parentheses  fulfil two functions is not always
-       helpful.  There are often times when a grouping subpattern is  required
-       without  a capturing requirement. If an opening parenthesis is followed
-       by a question mark and a colon, the subpattern does not do any  captur-
-       ing,  and  is  not  counted when computing the number of any subsequent
-       capturing subpatterns. For example, if the string "the white queen"  is
+       The fact that plain parentheses fulfil  two  functions  is  not  always
+       helpful.   There are often times when a grouping subpattern is required
+       without a capturing requirement. If an opening parenthesis is  followed
+       by  a question mark and a colon, the subpattern does not do any captur-
+       ing, and is not counted when computing the  number  of  any  subsequent
+       capturing  subpatterns. For example, if the string "the white queen" is
        matched against the pattern


          the ((?:red|white) (king|queen))
@@ -4431,37 +4451,37 @@
        the captured substrings are "white queen" and "queen", and are numbered
        1 and 2. The maximum number of capturing subpatterns is 65535.


-       As a convenient shorthand, if any option settings are required  at  the
-       start  of  a  non-capturing  subpattern,  the option letters may appear
+       As  a  convenient shorthand, if any option settings are required at the
+       start of a non-capturing subpattern,  the  option  letters  may  appear
        between the "?" and the ":". Thus the two patterns


          (?i:saturday|sunday)
          (?:(?i)saturday|sunday)


        match exactly the same set of strings. Because alternative branches are
-       tried  from  left  to right, and options are not reset until the end of
-       the subpattern is reached, an option setting in one branch does  affect
-       subsequent  branches,  so  the above patterns match "SUNDAY" as well as
+       tried from left to right, and options are not reset until  the  end  of
+       the  subpattern is reached, an option setting in one branch does affect
+       subsequent branches, so the above patterns match "SUNDAY"  as  well  as
        "Saturday".



DUPLICATE SUBPATTERN NUMBERS

        Perl 5.10 introduced a feature whereby each alternative in a subpattern
-       uses  the same numbers for its capturing parentheses. Such a subpattern
-       starts with (?| and is itself a non-capturing subpattern. For  example,
+       uses the same numbers for its capturing parentheses. Such a  subpattern
+       starts  with (?| and is itself a non-capturing subpattern. For example,
        consider this pattern:


          (?|(Sat)ur|(Sun))day


-       Because  the two alternatives are inside a (?| group, both sets of cap-
-       turing parentheses are numbered one. Thus, when  the  pattern  matches,
-       you  can  look  at captured substring number one, whichever alternative
-       matched. This construct is useful when you want to  capture  part,  but
+       Because the two alternatives are inside a (?| group, both sets of  cap-
+       turing  parentheses  are  numbered one. Thus, when the pattern matches,
+       you can look at captured substring number  one,  whichever  alternative
+       matched.  This  construct  is useful when you want to capture part, but
        not all, of one of a number of alternatives. Inside a (?| group, paren-
-       theses are numbered as usual, but the number is reset at the  start  of
-       each  branch.  The numbers of any capturing parentheses that follow the
-       subpattern start after the highest number used in any branch. The  fol-
+       theses  are  numbered as usual, but the number is reset at the start of
+       each branch. The numbers of any capturing parentheses that  follow  the
+       subpattern  start after the highest number used in any branch. The fol-
        lowing example is taken from the Perl documentation. The numbers under-
        neath show in which buffer the captured content will be stored.


@@ -4469,58 +4489,58 @@
          / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
          # 1            2         2  3        2     3     4


-       A back reference to a numbered subpattern uses the  most  recent  value
-       that  is  set  for that number by any subpattern. The following pattern
+       A  back  reference  to a numbered subpattern uses the most recent value
+       that is set for that number by any subpattern.  The  following  pattern
        matches "abcabc" or "defdef":


          /(?|(abc)|(def))\1/


-       In contrast, a subroutine call to a numbered subpattern  always  refers
-       to  the  first  one in the pattern with the given number. The following
+       In  contrast,  a subroutine call to a numbered subpattern always refers
+       to the first one in the pattern with the given  number.  The  following
        pattern matches "abcabc" or "defabc":


          /(?|(abc)|(def))(?1)/


-       If a condition test for a subpattern's having matched refers to a  non-
-       unique  number, the test is true if any of the subpatterns of that num-
+       If  a condition test for a subpattern's having matched refers to a non-
+       unique number, the test is true if any of the subpatterns of that  num-
        ber have matched.


-       An alternative approach to using this "branch reset" feature is to  use
+       An  alternative approach to using this "branch reset" feature is to use
        duplicate named subpatterns, as described in the next section.



NAMED SUBPATTERNS

-       Identifying  capturing  parentheses  by number is simple, but it can be
-       very hard to keep track of the numbers in complicated  regular  expres-
-       sions.  Furthermore,  if  an  expression  is  modified, the numbers may
-       change. To help with this difficulty, PCRE supports the naming of  sub-
+       Identifying capturing parentheses by number is simple, but  it  can  be
+       very  hard  to keep track of the numbers in complicated regular expres-
+       sions. Furthermore, if an  expression  is  modified,  the  numbers  may
+       change.  To help with this difficulty, PCRE supports the naming of sub-
        patterns. This feature was not added to Perl until release 5.10. Python
-       had the feature earlier, and PCRE introduced it at release  4.0,  using
-       the  Python syntax. PCRE now supports both the Perl and the Python syn-
-       tax. Perl allows identically numbered  subpatterns  to  have  different
+       had  the  feature earlier, and PCRE introduced it at release 4.0, using
+       the Python syntax. PCRE now supports both the Perl and the Python  syn-
+       tax.  Perl  allows  identically  numbered subpatterns to have different
        names, but PCRE does not.


-       In  PCRE,  a subpattern can be named in one of three ways: (?<name>...)
-       or (?'name'...) as in Perl, or (?P<name>...) as in  Python.  References
-       to  capturing parentheses from other parts of the pattern, such as back
-       references, recursion, and conditions, can be made by name as  well  as
+       In PCRE, a subpattern can be named in one of three  ways:  (?<name>...)
+       or  (?'name'...)  as in Perl, or (?P<name>...) as in Python. References
+       to capturing parentheses from other parts of the pattern, such as  back
+       references,  recursion,  and conditions, can be made by name as well as
        by number.


-       Names  consist  of  up  to  32 alphanumeric characters and underscores.
-       Named capturing parentheses are still  allocated  numbers  as  well  as
-       names,  exactly as if the names were not present. The PCRE API provides
+       Names consist of up to  32  alphanumeric  characters  and  underscores.
+       Named  capturing  parentheses  are  still  allocated numbers as well as
+       names, exactly as if the names were not present. The PCRE API  provides
        function calls for extracting the name-to-number translation table from
        a compiled pattern. There is also a convenience function for extracting
        a captured substring by name.


-       By default, a name must be unique within a pattern, but it is  possible
+       By  default, a name must be unique within a pattern, but it is possible
        to relax this constraint by setting the PCRE_DUPNAMES option at compile
-       time. (Duplicate names are also always permitted for  subpatterns  with
-       the  same  number, set up as described in the previous section.) Dupli-
-       cate names can be useful for patterns where only one  instance  of  the
-       named  parentheses  can  match. Suppose you want to match the name of a
-       weekday, either as a 3-letter abbreviation or as the full name, and  in
+       time.  (Duplicate  names are also always permitted for subpatterns with
+       the same number, set up as described in the previous  section.)  Dupli-
+       cate  names  can  be useful for patterns where only one instance of the
+       named parentheses can match. Suppose you want to match the  name  of  a
+       weekday,  either as a 3-letter abbreviation or as the full name, and in
        both cases you want to extract the abbreviation. This pattern (ignoring
        the line breaks) does the job:


@@ -4530,38 +4550,38 @@
          (?<DN>Thu)(?:rsday)?|
          (?<DN>Sat)(?:urday)?


-       There are five capturing substrings, but only one is ever set  after  a
+       There  are  five capturing substrings, but only one is ever set after a
        match.  (An alternative way of solving this problem is to use a "branch
        reset" subpattern, as described in the previous section.)


-       The convenience function for extracting the data by  name  returns  the
-       substring  for  the first (and in this example, the only) subpattern of
-       that name that matched. This saves searching  to  find  which  numbered
+       The  convenience  function  for extracting the data by name returns the
+       substring for the first (and in this example, the only)  subpattern  of
+       that  name  that  matched.  This saves searching to find which numbered
        subpattern it was.


-       If  you  make  a  back  reference to a non-unique named subpattern from
-       elsewhere in the pattern, the one that corresponds to the first  occur-
+       If you make a back reference to  a  non-unique  named  subpattern  from
+       elsewhere  in the pattern, the one that corresponds to the first occur-
        rence of the name is used. In the absence of duplicate numbers (see the
-       previous section) this is the one with the lowest number. If you use  a
-       named  reference  in a condition test (see the section about conditions
-       below), either to check whether a subpattern has matched, or  to  check
-       for  recursion,  all  subpatterns with the same name are tested. If the
-       condition is true for any one of them, the overall condition  is  true.
+       previous  section) this is the one with the lowest number. If you use a
+       named reference in a condition test (see the section  about  conditions
+       below),  either  to check whether a subpattern has matched, or to check
+       for recursion, all subpatterns with the same name are  tested.  If  the
+       condition  is  true for any one of them, the overall condition is true.
        This is the same behaviour as testing by number. For further details of
        the interfaces for handling named subpatterns, see the pcreapi documen-
        tation.


        Warning: You cannot use different names to distinguish between two sub-
-       patterns with the same number because PCRE uses only the  numbers  when
+       patterns  with  the same number because PCRE uses only the numbers when
        matching. For this reason, an error is given at compile time if differ-
-       ent names are given to subpatterns with the same number.  However,  you
-       can  give  the same name to subpatterns with the same number, even when
+       ent  names  are given to subpatterns with the same number. However, you
+       can give the same name to subpatterns with the same number,  even  when
        PCRE_DUPNAMES is not set.



REPETITION

-       Repetition is specified by quantifiers, which can  follow  any  of  the
+       Repetition  is  specified  by  quantifiers, which can follow any of the
        following items:


          a literal data character
@@ -4575,17 +4595,17 @@
          a parenthesized subpattern (including assertions)
          a subroutine call to a subpattern (recursive or otherwise)


-       The  general repetition quantifier specifies a minimum and maximum num-
-       ber of permitted matches, by giving the two numbers in  curly  brackets
-       (braces),  separated  by  a comma. The numbers must be less than 65536,
+       The general repetition quantifier specifies a minimum and maximum  num-
+       ber  of  permitted matches, by giving the two numbers in curly brackets
+       (braces), separated by a comma. The numbers must be  less  than  65536,
        and the first must be less than or equal to the second. For example:


          z{2,4}


-       matches "zz", "zzz", or "zzzz". A closing brace on its  own  is  not  a
-       special  character.  If  the second number is omitted, but the comma is
-       present, there is no upper limit; if the second number  and  the  comma
-       are  both omitted, the quantifier specifies an exact number of required
+       matches  "zz",  "zzz",  or  "zzzz". A closing brace on its own is not a
+       special character. If the second number is omitted, but  the  comma  is
+       present,  there  is  no upper limit; if the second number and the comma
+       are both omitted, the quantifier specifies an exact number of  required
        matches. Thus


          [aeiou]{3,}
@@ -4594,50 +4614,50 @@


          \d{8}


-       matches exactly 8 digits. An opening curly bracket that  appears  in  a
-       position  where a quantifier is not allowed, or one that does not match
-       the syntax of a quantifier, is taken as a literal character. For  exam-
+       matches  exactly  8  digits. An opening curly bracket that appears in a
+       position where a quantifier is not allowed, or one that does not  match
+       the  syntax of a quantifier, is taken as a literal character. For exam-
        ple, {,6} is not a quantifier, but a literal string of four characters.


-       In  UTF-8  mode,  quantifiers  apply to UTF-8 characters rather than to
+       In UTF-8 mode, quantifiers apply to UTF-8  characters  rather  than  to
        individual bytes. Thus, for example, \x{100}{2} matches two UTF-8 char-
        acters, each of which is represented by a two-byte sequence. Similarly,
        when Unicode property support is available, \X{3} matches three Unicode
-       extended  sequences,  each of which may be several bytes long (and they
+       extended sequences, each of which may be several bytes long  (and  they
        may be of different lengths).


        The quantifier {0} is permitted, causing the expression to behave as if
        the previous item and the quantifier were not present. This may be use-
-       ful for subpatterns that are referenced as subroutines  from  elsewhere
+       ful  for  subpatterns that are referenced as subroutines from elsewhere
        in the pattern (but see also the section entitled "Defining subpatterns
-       for use by reference only" below). Items other  than  subpatterns  that
+       for  use  by  reference only" below). Items other than subpatterns that
        have a {0} quantifier are omitted from the compiled pattern.


-       For  convenience, the three most common quantifiers have single-charac-
+       For convenience, the three most common quantifiers have  single-charac-
        ter abbreviations:


          *    is equivalent to {0,}
          +    is equivalent to {1,}
          ?    is equivalent to {0,1}


-       It is possible to construct infinite loops by  following  a  subpattern
+       It  is  possible  to construct infinite loops by following a subpattern
        that can match no characters with a quantifier that has no upper limit,
        for example:


          (a?)*


        Earlier versions of Perl and PCRE used to give an error at compile time
-       for  such  patterns. However, because there are cases where this can be
-       useful, such patterns are now accepted, but if any  repetition  of  the
-       subpattern  does in fact match no characters, the loop is forcibly bro-
+       for such patterns. However, because there are cases where this  can  be
+       useful,  such  patterns  are now accepted, but if any repetition of the
+       subpattern does in fact match no characters, the loop is forcibly  bro-
        ken.


-       By default, the quantifiers are "greedy", that is, they match  as  much
-       as  possible  (up  to  the  maximum number of permitted times), without
-       causing the rest of the pattern to fail. The classic example  of  where
+       By  default,  the quantifiers are "greedy", that is, they match as much
+       as possible (up to the maximum  number  of  permitted  times),  without
+       causing  the  rest of the pattern to fail. The classic example of where
        this gives problems is in trying to match comments in C programs. These
-       appear between /* and */ and within the comment,  individual  *  and  /
-       characters  may  appear. An attempt to match C comments by applying the
+       appear  between  /*  and  */ and within the comment, individual * and /
+       characters may appear. An attempt to match C comments by  applying  the
        pattern


          /\*.*\*/
@@ -4646,19 +4666,19 @@


          /* first comment */  not comment  /* second comment */


-       fails, because it matches the entire string owing to the greediness  of
+       fails,  because it matches the entire string owing to the greediness of
        the .*  item.


-       However,  if  a quantifier is followed by a question mark, it ceases to
+       However, if a quantifier is followed by a question mark, it  ceases  to
        be greedy, and instead matches the minimum number of times possible, so
        the pattern


          /\*.*?\*/


-       does  the  right  thing with the C comments. The meaning of the various
-       quantifiers is not otherwise changed,  just  the  preferred  number  of
-       matches.   Do  not  confuse this use of question mark with its use as a
-       quantifier in its own right. Because it has two uses, it can  sometimes
+       does the right thing with the C comments. The meaning  of  the  various
+       quantifiers  is  not  otherwise  changed,  just the preferred number of
+       matches.  Do not confuse this use of question mark with its  use  as  a
+       quantifier  in its own right. Because it has two uses, it can sometimes
        appear doubled, as in


          \d??\d
@@ -4666,36 +4686,36 @@
        which matches one digit by preference, but can match two if that is the
        only way the rest of the pattern matches.


-       If the PCRE_UNGREEDY option is set (an option that is not available  in
-       Perl),  the  quantifiers are not greedy by default, but individual ones
-       can be made greedy by following them with a  question  mark.  In  other
+       If  the PCRE_UNGREEDY option is set (an option that is not available in
+       Perl), the quantifiers are not greedy by default, but  individual  ones
+       can  be  made  greedy  by following them with a question mark. In other
        words, it inverts the default behaviour.


-       When  a  parenthesized  subpattern  is quantified with a minimum repeat
-       count that is greater than 1 or with a limited maximum, more memory  is
-       required  for  the  compiled  pattern, in proportion to the size of the
+       When a parenthesized subpattern is quantified  with  a  minimum  repeat
+       count  that is greater than 1 or with a limited maximum, more memory is
+       required for the compiled pattern, in proportion to  the  size  of  the
        minimum or maximum.


        If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv-
-       alent  to  Perl's  /s) is set, thus allowing the dot to match newlines,
-       the pattern is implicitly anchored, because whatever  follows  will  be
-       tried  against every character position in the subject string, so there
-       is no point in retrying the overall match at  any  position  after  the
-       first.  PCRE  normally treats such a pattern as though it were preceded
+       alent to Perl's /s) is set, thus allowing the dot  to  match  newlines,
+       the  pattern  is  implicitly anchored, because whatever follows will be
+       tried against every character position in the subject string, so  there
+       is  no  point  in  retrying the overall match at any position after the
+       first. PCRE normally treats such a pattern as though it  were  preceded
        by \A.


-       In cases where it is known that the subject  string  contains  no  new-
-       lines,  it  is  worth setting PCRE_DOTALL in order to obtain this opti-
+       In  cases  where  it  is known that the subject string contains no new-
+       lines, it is worth setting PCRE_DOTALL in order to  obtain  this  opti-
        mization, or alternatively using ^ to indicate anchoring explicitly.


-       However, there is one situation where the optimization cannot be  used.
+       However,  there is one situation where the optimization cannot be used.
        When .*  is inside capturing parentheses that are the subject of a back
        reference elsewhere in the pattern, a match at the start may fail where
        a later one succeeds. Consider, for example:


          (.*)abc\1


-       If  the subject is "xyz123abc123" the match point is the fourth charac-
+       If the subject is "xyz123abc123" the match point is the fourth  charac-
        ter. For this reason, such a pattern is not implicitly anchored.


        When a capturing subpattern is repeated, the value captured is the sub-
@@ -4704,8 +4724,8 @@
          (tweedle[dume]{3}\s*)+


        has matched "tweedledum tweedledee" the value of the captured substring
-       is "tweedledee". However, if there are  nested  capturing  subpatterns,
-       the  corresponding captured values may have been set in previous itera-
+       is  "tweedledee".  However,  if there are nested capturing subpatterns,
+       the corresponding captured values may have been set in previous  itera-
        tions. For example, after


          /(a|(b))+/
@@ -4715,53 +4735,53 @@


ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS

-       With both maximizing ("greedy") and minimizing ("ungreedy"  or  "lazy")
-       repetition,  failure  of what follows normally causes the repeated item
-       to be re-evaluated to see if a different number of repeats  allows  the
-       rest  of  the pattern to match. Sometimes it is useful to prevent this,
-       either to change the nature of the match, or to cause it  fail  earlier
-       than  it otherwise might, when the author of the pattern knows there is
+       With  both  maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
+       repetition, failure of what follows normally causes the  repeated  item
+       to  be  re-evaluated to see if a different number of repeats allows the
+       rest of the pattern to match. Sometimes it is useful to  prevent  this,
+       either  to  change the nature of the match, or to cause it fail earlier
+       than it otherwise might, when the author of the pattern knows there  is
        no point in carrying on.


-       Consider, for example, the pattern \d+foo when applied to  the  subject
+       Consider,  for  example, the pattern \d+foo when applied to the subject
        line


          123456bar


        After matching all 6 digits and then failing to match "foo", the normal
-       action of the matcher is to try again with only 5 digits  matching  the
-       \d+  item,  and  then  with  4,  and  so on, before ultimately failing.
-       "Atomic grouping" (a term taken from Jeffrey  Friedl's  book)  provides
-       the  means for specifying that once a subpattern has matched, it is not
+       action  of  the matcher is to try again with only 5 digits matching the
+       \d+ item, and then with  4,  and  so  on,  before  ultimately  failing.
+       "Atomic  grouping"  (a  term taken from Jeffrey Friedl's book) provides
+       the means for specifying that once a subpattern has matched, it is  not
        to be re-evaluated in this way.


-       If we use atomic grouping for the previous example, the  matcher  gives
-       up  immediately  on failing to match "foo" the first time. The notation
+       If  we  use atomic grouping for the previous example, the matcher gives
+       up immediately on failing to match "foo" the first time.  The  notation
        is a kind of special parenthesis, starting with (?> as in this example:


          (?>\d+)foo


-       This kind of parenthesis "locks up" the  part of the  pattern  it  con-
-       tains  once  it  has matched, and a failure further into the pattern is
-       prevented from backtracking into it. Backtracking past it  to  previous
+       This  kind  of  parenthesis "locks up" the  part of the pattern it con-
+       tains once it has matched, and a failure further into  the  pattern  is
+       prevented  from  backtracking into it. Backtracking past it to previous
        items, however, works as normal.


-       An  alternative  description  is that a subpattern of this type matches
-       the string of characters that an  identical  standalone  pattern  would
+       An alternative description is that a subpattern of  this  type  matches
+       the  string  of  characters  that an identical standalone pattern would
        match, if anchored at the current point in the subject string.


        Atomic grouping subpatterns are not capturing subpatterns. Simple cases
        such as the above example can be thought of as a maximizing repeat that
-       must  swallow  everything  it can. So, while both \d+ and \d+? are pre-
-       pared to adjust the number of digits they match in order  to  make  the
+       must swallow everything it can. So, while both \d+ and  \d+?  are  pre-
+       pared  to  adjust  the number of digits they match in order to make the
        rest of the pattern match, (?>\d+) can only match an entire sequence of
        digits.


-       Atomic groups in general can of course contain arbitrarily  complicated
-       subpatterns,  and  can  be  nested. However, when the subpattern for an
+       Atomic  groups in general can of course contain arbitrarily complicated
+       subpatterns, and can be nested. However, when  the  subpattern  for  an
        atomic group is just a single repeated item, as in the example above, a
-       simpler  notation,  called  a "possessive quantifier" can be used. This
-       consists of an additional + character  following  a  quantifier.  Using
+       simpler notation, called a "possessive quantifier" can  be  used.  This
+       consists  of  an  additional  + character following a quantifier. Using
        this notation, the previous example can be rewritten as


          \d++foo
@@ -4771,45 +4791,45 @@


          (abc|xyz){2,3}+


-       Possessive  quantifiers  are  always  greedy;  the   setting   of   the
+       Possessive   quantifiers   are   always  greedy;  the  setting  of  the
        PCRE_UNGREEDY option is ignored. They are a convenient notation for the
-       simpler forms of atomic group. However, there is no difference  in  the
-       meaning  of  a  possessive  quantifier and the equivalent atomic group,
-       though there may be a performance  difference;  possessive  quantifiers
+       simpler  forms  of atomic group. However, there is no difference in the
+       meaning of a possessive quantifier and  the  equivalent  atomic  group,
+       though  there  may  be a performance difference; possessive quantifiers
        should be slightly faster.


-       The  possessive  quantifier syntax is an extension to the Perl 5.8 syn-
-       tax.  Jeffrey Friedl originated the idea (and the name)  in  the  first
+       The possessive quantifier syntax is an extension to the Perl  5.8  syn-
+       tax.   Jeffrey  Friedl  originated the idea (and the name) in the first
        edition of his book. Mike McCloskey liked it, so implemented it when he
-       built Sun's Java package, and PCRE copied it from there. It  ultimately
+       built  Sun's Java package, and PCRE copied it from there. It ultimately
        found its way into Perl at release 5.10.


        PCRE has an optimization that automatically "possessifies" certain sim-
-       ple pattern constructs. For example, the sequence  A+B  is  treated  as
-       A++B  because  there is no point in backtracking into a sequence of A's
+       ple  pattern  constructs.  For  example, the sequence A+B is treated as
+       A++B because there is no point in backtracking into a sequence  of  A's
        when B must follow.


-       When a pattern contains an unlimited repeat inside  a  subpattern  that
-       can  itself  be  repeated  an  unlimited number of times, the use of an
-       atomic group is the only way to avoid some  failing  matches  taking  a
+       When  a  pattern  contains an unlimited repeat inside a subpattern that
+       can itself be repeated an unlimited number of  times,  the  use  of  an
+       atomic  group  is  the  only way to avoid some failing matches taking a
        very long time indeed. The pattern


          (\D+|<\d+>)*[!?]


-       matches  an  unlimited number of substrings that either consist of non-
-       digits, or digits enclosed in <>, followed by either ! or  ?.  When  it
+       matches an unlimited number of substrings that either consist  of  non-
+       digits,  or  digits  enclosed in <>, followed by either ! or ?. When it
        matches, it runs quickly. However, if it is applied to


          aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa


-       it  takes  a  long  time  before reporting failure. This is because the
-       string can be divided between the internal \D+ repeat and the  external
-       *  repeat  in  a  large  number of ways, and all have to be tried. (The
-       example uses [!?] rather than a single character at  the  end,  because
-       both  PCRE  and  Perl have an optimization that allows for fast failure
-       when a single character is used. They remember the last single  charac-
-       ter  that  is required for a match, and fail early if it is not present
-       in the string.) If the pattern is changed so that  it  uses  an  atomic
+       it takes a long time before reporting  failure.  This  is  because  the
+       string  can be divided between the internal \D+ repeat and the external
+       * repeat in a large number of ways, and all  have  to  be  tried.  (The
+       example  uses  [!?]  rather than a single character at the end, because
+       both PCRE and Perl have an optimization that allows  for  fast  failure
+       when  a single character is used. They remember the last single charac-
+       ter that is required for a match, and fail early if it is  not  present
+       in  the  string.)  If  the pattern is changed so that it uses an atomic
        group, like this:


          ((?>\D+)|<\d+>)*[!?]
@@ -4821,28 +4841,28 @@


        Outside a character class, a backslash followed by a digit greater than
        0 (and possibly further digits) is a back reference to a capturing sub-
-       pattern  earlier  (that is, to its left) in the pattern, provided there
+       pattern earlier (that is, to its left) in the pattern,  provided  there
        have been that many previous capturing left parentheses.


        However, if the decimal number following the backslash is less than 10,
-       it  is  always  taken  as a back reference, and causes an error only if
-       there are not that many capturing left parentheses in the  entire  pat-
-       tern.  In  other words, the parentheses that are referenced need not be
-       to the left of the reference for numbers less than 10. A "forward  back
-       reference"  of  this  type can make sense when a repetition is involved
-       and the subpattern to the right has participated in an  earlier  itera-
+       it is always taken as a back reference, and causes  an  error  only  if
+       there  are  not that many capturing left parentheses in the entire pat-
+       tern. In other words, the parentheses that are referenced need  not  be
+       to  the left of the reference for numbers less than 10. A "forward back
+       reference" of this type can make sense when a  repetition  is  involved
+       and  the  subpattern to the right has participated in an earlier itera-
        tion.


-       It  is  not  possible to have a numerical "forward back reference" to a
-       subpattern whose number is 10 or  more  using  this  syntax  because  a
-       sequence  such  as  \50 is interpreted as a character defined in octal.
+       It is not possible to have a numerical "forward back  reference"  to  a
+       subpattern  whose  number  is  10  or  more using this syntax because a
+       sequence such as \50 is interpreted as a character  defined  in  octal.
        See the subsection entitled "Non-printing characters" above for further
-       details  of  the  handling of digits following a backslash. There is no
-       such problem when named parentheses are used. A back reference  to  any
+       details of the handling of digits following a backslash.  There  is  no
+       such  problem  when named parentheses are used. A back reference to any
        subpattern is possible using named parentheses (see below).


-       Another  way  of  avoiding  the ambiguity inherent in the use of digits
-       following a backslash is to use the \g  escape  sequence.  This  escape
+       Another way of avoiding the ambiguity inherent in  the  use  of  digits
+       following  a  backslash  is  to use the \g escape sequence. This escape
        must be followed by an unsigned number or a negative number, optionally
        enclosed in braces. These examples are all identical:


@@ -4850,7 +4870,7 @@
          (ring), \g1
          (ring), \g{1}


-       An unsigned number specifies an absolute reference without the  ambigu-
+       An  unsigned number specifies an absolute reference without the ambigu-
        ity that is present in the older syntax. It is also useful when literal
        digits follow the reference. A negative number is a relative reference.
        Consider this example:
@@ -4859,33 +4879,33 @@


        The sequence \g{-1} is a reference to the most recently started captur-
        ing subpattern before \g, that is, is it equivalent to \2 in this exam-
-       ple.   Similarly, \g{-2} would be equivalent to \1. The use of relative
-       references can be helpful in long patterns, and also in  patterns  that
-       are  created  by  joining  together  fragments  that contain references
+       ple.  Similarly, \g{-2} would be equivalent to \1. The use of  relative
+       references  can  be helpful in long patterns, and also in patterns that
+       are created by  joining  together  fragments  that  contain  references
        within themselves.


-       A back reference matches whatever actually matched the  capturing  sub-
-       pattern  in  the  current subject string, rather than anything matching
+       A  back  reference matches whatever actually matched the capturing sub-
+       pattern in the current subject string, rather  than  anything  matching
        the subpattern itself (see "Subpatterns as subroutines" below for a way
        of doing that). So the pattern


          (sens|respons)e and \1ibility


-       matches  "sense and sensibility" and "response and responsibility", but
-       not "sense and responsibility". If caseful matching is in force at  the
-       time  of the back reference, the case of letters is relevant. For exam-
+       matches "sense and sensibility" and "response and responsibility",  but
+       not  "sense and responsibility". If caseful matching is in force at the
+       time of the back reference, the case of letters is relevant. For  exam-
        ple,


          ((?i)rah)\s+\1


-       matches "rah rah" and "RAH RAH", but not "RAH  rah",  even  though  the
+       matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the
        original capturing subpattern is matched caselessly.


-       There  are  several  different ways of writing back references to named
-       subpatterns. The .NET syntax \k{name} and the Perl syntax  \k<name>  or
-       \k'name'  are supported, as is the Python syntax (?P=name). Perl 5.10's
+       There are several different ways of writing back  references  to  named
+       subpatterns.  The  .NET syntax \k{name} and the Perl syntax \k<name> or
+       \k'name' are supported, as is the Python syntax (?P=name). Perl  5.10's
        unified back reference syntax, in which \g can be used for both numeric
-       and  named  references,  is  also supported. We could rewrite the above
+       and named references, is also supported. We  could  rewrite  the  above
        example in any of the following ways:


          (?<p1>(?i)rah)\s+\k<p1>
@@ -4893,83 +4913,83 @@
          (?P<p1>(?i)rah)\s+(?P=p1)
          (?<p1>(?i)rah)\s+\g{p1}


-       A subpattern that is referenced by  name  may  appear  in  the  pattern
+       A  subpattern  that  is  referenced  by  name may appear in the pattern
        before or after the reference.


-       There  may be more than one back reference to the same subpattern. If a
-       subpattern has not actually been used in a particular match,  any  back
+       There may be more than one back reference to the same subpattern. If  a
+       subpattern  has  not actually been used in a particular match, any back
        references to it always fail by default. For example, the pattern


          (a|(bc))\2


-       always  fails  if  it starts to match "a" rather than "bc". However, if
+       always fails if it starts to match "a" rather than  "bc".  However,  if
        the PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back refer-
        ence to an unset value matches an empty string.


-       Because  there may be many capturing parentheses in a pattern, all dig-
-       its following a backslash are taken as part of a potential back  refer-
-       ence  number.   If  the  pattern continues with a digit character, some
-       delimiter must  be  used  to  terminate  the  back  reference.  If  the
+       Because there may be many capturing parentheses in a pattern, all  dig-
+       its  following a backslash are taken as part of a potential back refer-
+       ence number.  If the pattern continues with  a  digit  character,  some
+       delimiter  must  be  used  to  terminate  the  back  reference.  If the
        PCRE_EXTENDED option is set, this can be whitespace. Otherwise, the \g{
        syntax or an empty comment (see "Comments" below) can be used.


    Recursive back references


-       A back reference that occurs inside the parentheses to which it  refers
-       fails  when  the subpattern is first used, so, for example, (a\1) never
-       matches.  However, such references can be useful inside  repeated  sub-
+       A  back reference that occurs inside the parentheses to which it refers
+       fails when the subpattern is first used, so, for example,  (a\1)  never
+       matches.   However,  such references can be useful inside repeated sub-
        patterns. For example, the pattern


          (a|b\1)+


        matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
-       ation of the subpattern,  the  back  reference  matches  the  character
-       string  corresponding  to  the previous iteration. In order for this to
-       work, the pattern must be such that the first iteration does  not  need
-       to  match the back reference. This can be done using alternation, as in
+       ation  of  the  subpattern,  the  back  reference matches the character
+       string corresponding to the previous iteration. In order  for  this  to
+       work,  the  pattern must be such that the first iteration does not need
+       to match the back reference. This can be done using alternation, as  in
        the example above, or by a quantifier with a minimum of zero.


-       Back references of this type cause the group that they reference to  be
-       treated  as  an atomic group.  Once the whole group has been matched, a
-       subsequent matching failure cannot cause backtracking into  the  middle
+       Back  references of this type cause the group that they reference to be
+       treated as an atomic group.  Once the whole group has been  matched,  a
+       subsequent  matching  failure cannot cause backtracking into the middle
        of the group.



ASSERTIONS

-       An  assertion  is  a  test on the characters following or preceding the
-       current matching point that does not actually consume  any  characters.
-       The  simple  assertions  coded  as  \b, \B, \A, \G, \Z, \z, ^ and $ are
+       An assertion is a test on the characters  following  or  preceding  the
+       current  matching  point that does not actually consume any characters.
+       The simple assertions coded as \b, \B, \A, \G, \Z,  \z,  ^  and  $  are
        described above.


-       More complicated assertions are coded as  subpatterns.  There  are  two
-       kinds:  those  that  look  ahead of the current position in the subject
-       string, and those that look  behind  it.  An  assertion  subpattern  is
-       matched  in  the  normal way, except that it does not cause the current
+       More  complicated  assertions  are  coded as subpatterns. There are two
+       kinds: those that look ahead of the current  position  in  the  subject
+       string,  and  those  that  look  behind  it. An assertion subpattern is
+       matched in the normal way, except that it does not  cause  the  current
        matching position to be changed.


-       Assertion subpatterns are not capturing subpatterns. If such an  asser-
-       tion  contains  capturing  subpatterns within it, these are counted for
-       the purposes of numbering the capturing subpatterns in the  whole  pat-
-       tern.  However,  substring  capturing  is carried out only for positive
+       Assertion  subpatterns are not capturing subpatterns. If such an asser-
+       tion contains capturing subpatterns within it, these  are  counted  for
+       the  purposes  of numbering the capturing subpatterns in the whole pat-
+       tern. However, substring capturing is carried  out  only  for  positive
        assertions, because it does not make sense for negative assertions.


-       For compatibility with Perl, assertion  subpatterns  may  be  repeated;
-       though  it  makes  no sense to assert the same thing several times, the
-       side effect of capturing parentheses may  occasionally  be  useful.  In
+       For  compatibility  with  Perl,  assertion subpatterns may be repeated;
+       though it makes no sense to assert the same thing  several  times,  the
+       side  effect  of  capturing  parentheses may occasionally be useful. In
        practice, there only three cases:


-       (1)  If  the  quantifier  is  {0}, the assertion is never obeyed during
-       matching.  However, it may  contain  internal  capturing  parenthesized
+       (1) If the quantifier is {0}, the  assertion  is  never  obeyed  during
+       matching.   However,  it  may  contain internal capturing parenthesized
        groups that are called from elsewhere via the subroutine mechanism.


-       (2)  If quantifier is {0,n} where n is greater than zero, it is treated
-       as if it were {0,1}. At run time, the rest  of  the  pattern  match  is
+       (2) If quantifier is {0,n} where n is greater than zero, it is  treated
+       as  if  it  were  {0,1}.  At run time, the rest of the pattern match is
        tried with and without the assertion, the order depending on the greed-
        iness of the quantifier.


-       (3) If the minimum repetition is greater than zero, the  quantifier  is
-       ignored.   The  assertion  is  obeyed just once when encountered during
+       (3)  If  the minimum repetition is greater than zero, the quantifier is
+       ignored.  The assertion is obeyed just  once  when  encountered  during
        matching.


    Lookahead assertions
@@ -4979,38 +4999,38 @@


          \w+(?=;)


-       matches  a word followed by a semicolon, but does not include the semi-
+       matches a word followed by a semicolon, but does not include the  semi-
        colon in the match, and


          foo(?!bar)


-       matches any occurrence of "foo" that is not  followed  by  "bar".  Note
+       matches  any  occurrence  of  "foo" that is not followed by "bar". Note
        that the apparently similar pattern


          (?!foo)bar


-       does  not  find  an  occurrence  of "bar" that is preceded by something
-       other than "foo"; it finds any occurrence of "bar" whatsoever,  because
+       does not find an occurrence of "bar"  that  is  preceded  by  something
+       other  than "foo"; it finds any occurrence of "bar" whatsoever, because
        the assertion (?!foo) is always true when the next three characters are
        "bar". A lookbehind assertion is needed to achieve the other effect.


        If you want to force a matching failure at some point in a pattern, the
-       most  convenient  way  to  do  it  is with (?!) because an empty string
-       always matches, so an assertion that requires there not to be an  empty
+       most convenient way to do it is  with  (?!)  because  an  empty  string
+       always  matches, so an assertion that requires there not to be an empty
        string must always fail.  The backtracking control verb (*FAIL) or (*F)
        is a synonym for (?!).


    Lookbehind assertions


-       Lookbehind assertions start with (?<= for positive assertions and  (?<!
+       Lookbehind  assertions start with (?<= for positive assertions and (?<!
        for negative assertions. For example,


          (?<!foo)bar


-       does  find  an  occurrence  of "bar" that is not preceded by "foo". The
-       contents of a lookbehind assertion are restricted  such  that  all  the
+       does find an occurrence of "bar" that is not  preceded  by  "foo".  The
+       contents  of  a  lookbehind  assertion are restricted such that all the
        strings it matches must have a fixed length. However, if there are sev-
-       eral top-level alternatives, they do not all  have  to  have  the  same
+       eral  top-level  alternatives,  they  do  not all have to have the same
        fixed length. Thus


          (?<=bullock|donkey)
@@ -5019,61 +5039,61 @@


          (?<!dogs?|cats?)


-       causes  an  error at compile time. Branches that match different length
-       strings are permitted only at the top level of a lookbehind  assertion.
+       causes an error at compile time. Branches that match  different  length
+       strings  are permitted only at the top level of a lookbehind assertion.
        This is an extension compared with Perl, which requires all branches to
        match the same length of string. An assertion such as


          (?<=ab(c|de))


-       is not permitted, because its single top-level  branch  can  match  two
+       is  not  permitted,  because  its single top-level branch can match two
        different lengths, but it is acceptable to PCRE if rewritten to use two
        top-level branches:


          (?<=abc|abde)


-       In some cases, the escape sequence \K (see above) can be  used  instead
+       In  some  cases, the escape sequence \K (see above) can be used instead
        of a lookbehind assertion to get round the fixed-length restriction.


-       The  implementation  of lookbehind assertions is, for each alternative,
-       to temporarily move the current position back by the fixed  length  and
+       The implementation of lookbehind assertions is, for  each  alternative,
+       to  temporarily  move the current position back by the fixed length and
        then try to match. If there are insufficient characters before the cur-
        rent position, the assertion fails.


        PCRE does not allow the \C escape (which matches a single byte in UTF-8
-       mode)  to appear in lookbehind assertions, because it makes it impossi-
-       ble to calculate the length of the lookbehind. The \X and  \R  escapes,
+       mode) to appear in lookbehind assertions, because it makes it  impossi-
+       ble  to  calculate the length of the lookbehind. The \X and \R escapes,
        which can match different numbers of bytes, are also not permitted.


-       "Subroutine"  calls  (see below) such as (?2) or (?&X) are permitted in
-       lookbehinds, as long as the subpattern matches a  fixed-length  string.
+       "Subroutine" calls (see below) such as (?2) or (?&X) are  permitted  in
+       lookbehinds,  as  long as the subpattern matches a fixed-length string.
        Recursion, however, is not supported.


-       Possessive  quantifiers  can  be  used  in  conjunction with lookbehind
+       Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
        assertions to specify efficient matching of fixed-length strings at the
        end of subject strings. Consider a simple pattern such as


          abcd$


-       when  applied  to  a  long string that does not match. Because matching
+       when applied to a long string that does  not  match.  Because  matching
        proceeds from left to right, PCRE will look for each "a" in the subject
-       and  then  see  if what follows matches the rest of the pattern. If the
+       and then see if what follows matches the rest of the  pattern.  If  the
        pattern is specified as


          ^.*abcd$


-       the initial .* matches the entire string at first, but when this  fails
+       the  initial .* matches the entire string at first, but when this fails
        (because there is no following "a"), it backtracks to match all but the
-       last character, then all but the last two characters, and so  on.  Once
-       again  the search for "a" covers the entire string, from right to left,
+       last  character,  then all but the last two characters, and so on. Once
+       again the search for "a" covers the entire string, from right to  left,
        so we are no better off. However, if the pattern is written as


          ^.*+(?<=abcd)


-       there can be no backtracking for the .*+ item; it can  match  only  the
-       entire  string.  The subsequent lookbehind assertion does a single test
-       on the last four characters. If it fails, the match fails  immediately.
-       For  long  strings, this approach makes a significant difference to the
+       there  can  be  no backtracking for the .*+ item; it can match only the
+       entire string. The subsequent lookbehind assertion does a  single  test
+       on  the last four characters. If it fails, the match fails immediately.
+       For long strings, this approach makes a significant difference  to  the
        processing time.


    Using multiple assertions
@@ -5082,18 +5102,18 @@


          (?<=\d{3})(?<!999)foo


-       matches "foo" preceded by three digits that are not "999". Notice  that
-       each  of  the  assertions is applied independently at the same point in
-       the subject string. First there is a  check  that  the  previous  three
-       characters  are  all  digits,  and  then there is a check that the same
+       matches  "foo" preceded by three digits that are not "999". Notice that
+       each of the assertions is applied independently at the  same  point  in
+       the  subject  string.  First  there  is a check that the previous three
+       characters are all digits, and then there is  a  check  that  the  same
        three characters are not "999".  This pattern does not match "foo" pre-
-       ceded  by  six  characters,  the first of which are digits and the last
-       three of which are not "999". For example, it  doesn't  match  "123abc-
+       ceded by six characters, the first of which are  digits  and  the  last
+       three  of  which  are not "999". For example, it doesn't match "123abc-
        foo". A pattern to do that is


          (?<=\d{3}...)(?<!999)foo


-       This  time  the  first assertion looks at the preceding six characters,
+       This time the first assertion looks at the  preceding  six  characters,
        checking that the first three are digits, and then the second assertion
        checks that the preceding three characters are not "999".


@@ -5101,29 +5121,29 @@

          (?<=(?<!foo)bar)baz


-       matches  an occurrence of "baz" that is preceded by "bar" which in turn
+       matches an occurrence of "baz" that is preceded by "bar" which in  turn
        is not preceded by "foo", while


          (?<=\d{3}(?!999)...)foo


-       is another pattern that matches "foo" preceded by three digits and  any
+       is  another pattern that matches "foo" preceded by three digits and any
        three characters that are not "999".



CONDITIONAL SUBPATTERNS

-       It  is possible to cause the matching process to obey a subpattern con-
-       ditionally or to choose between two alternative subpatterns,  depending
-       on  the result of an assertion, or whether a specific capturing subpat-
-       tern has already been matched. The two possible  forms  of  conditional
+       It is possible to cause the matching process to obey a subpattern  con-
+       ditionally  or to choose between two alternative subpatterns, depending
+       on the result of an assertion, or whether a specific capturing  subpat-
+       tern  has  already  been matched. The two possible forms of conditional
        subpattern are:


          (?(condition)yes-pattern)
          (?(condition)yes-pattern|no-pattern)


-       If  the  condition is satisfied, the yes-pattern is used; otherwise the
-       no-pattern (if present) is used. If there are more  than  two  alterna-
-       tives  in  the subpattern, a compile-time error occurs. Each of the two
+       If the condition is satisfied, the yes-pattern is used;  otherwise  the
+       no-pattern  (if  present)  is used. If there are more than two alterna-
+       tives in the subpattern, a compile-time error occurs. Each of  the  two
        alternatives may itself contain nested subpatterns of any form, includ-
        ing  conditional  subpatterns;  the  restriction  to  two  alternatives
        applies only at the level of the condition. This pattern fragment is an
@@ -5132,73 +5152,73 @@
          (?(1) (A|B|C) | (D | (?(2)E|F) | E) )



-       There  are  four  kinds of condition: references to subpatterns, refer-
+       There are four kinds of condition: references  to  subpatterns,  refer-
        ences to recursion, a pseudo-condition called DEFINE, and assertions.


    Checking for a used subpattern by number


-       If the text between the parentheses consists of a sequence  of  digits,
+       If  the  text between the parentheses consists of a sequence of digits,
        the condition is true if a capturing subpattern of that number has pre-
-       viously matched. If there is more than one  capturing  subpattern  with
-       the  same  number  (see  the earlier section about duplicate subpattern
-       numbers), the condition is true if any of them have matched. An  alter-
-       native  notation is to precede the digits with a plus or minus sign. In
-       this case, the subpattern number is relative rather than absolute.  The
-       most  recently opened parentheses can be referenced by (?(-1), the next
-       most recent by (?(-2), and so on. Inside loops it can also  make  sense
+       viously  matched.  If  there is more than one capturing subpattern with
+       the same number (see the earlier  section  about  duplicate  subpattern
+       numbers),  the condition is true if any of them have matched. An alter-
+       native notation is to precede the digits with a plus or minus sign.  In
+       this  case, the subpattern number is relative rather than absolute. The
+       most recently opened parentheses can be referenced by (?(-1), the  next
+       most  recent  by (?(-2), and so on. Inside loops it can also make sense
        to refer to subsequent groups. The next parentheses to be opened can be
-       referenced as (?(+1), and so on. (The value zero in any of these  forms
+       referenced  as (?(+1), and so on. (The value zero in any of these forms
        is not used; it provokes a compile-time error.)


-       Consider  the  following  pattern, which contains non-significant white
+       Consider the following pattern, which  contains  non-significant  white
        space to make it more readable (assume the PCRE_EXTENDED option) and to
        divide it into three parts for ease of discussion:


          ( \( )?    [^()]+    (?(1) \) )


-       The  first  part  matches  an optional opening parenthesis, and if that
+       The first part matches an optional opening  parenthesis,  and  if  that
        character is present, sets it as the first captured substring. The sec-
-       ond  part  matches one or more characters that are not parentheses. The
-       third part is a conditional subpattern that tests whether  or  not  the
-       first  set  of  parentheses  matched.  If they did, that is, if subject
-       started with an opening parenthesis, the condition is true, and so  the
-       yes-pattern  is  executed and a closing parenthesis is required. Other-
-       wise, since no-pattern is not present, the subpattern matches  nothing.
-       In  other  words,  this  pattern matches a sequence of non-parentheses,
+       ond part matches one or more characters that are not  parentheses.  The
+       third  part  is  a conditional subpattern that tests whether or not the
+       first set of parentheses matched. If they  did,  that  is,  if  subject
+       started  with an opening parenthesis, the condition is true, and so the
+       yes-pattern is executed and a closing parenthesis is  required.  Other-
+       wise,  since no-pattern is not present, the subpattern matches nothing.
+       In other words, this pattern matches  a  sequence  of  non-parentheses,
        optionally enclosed in parentheses.


-       If you were embedding this pattern in a larger one,  you  could  use  a
+       If  you  were  embedding  this pattern in a larger one, you could use a
        relative reference:


          ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...


-       This  makes  the  fragment independent of the parentheses in the larger
+       This makes the fragment independent of the parentheses  in  the  larger
        pattern.


    Checking for a used subpattern by name


-       Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a
-       used  subpattern  by  name.  For compatibility with earlier versions of
-       PCRE, which had this facility before Perl, the syntax  (?(name)...)  is
-       also  recognized. However, there is a possible ambiguity with this syn-
-       tax, because subpattern names may  consist  entirely  of  digits.  PCRE
-       looks  first for a named subpattern; if it cannot find one and the name
-       consists entirely of digits, PCRE looks for a subpattern of  that  num-
-       ber,  which must be greater than zero. Using subpattern names that con-
+       Perl  uses  the  syntax  (?(<name>)...) or (?('name')...) to test for a
+       used subpattern by name. For compatibility  with  earlier  versions  of
+       PCRE,  which  had this facility before Perl, the syntax (?(name)...) is
+       also recognized. However, there is a possible ambiguity with this  syn-
+       tax,  because  subpattern  names  may  consist entirely of digits. PCRE
+       looks first for a named subpattern; if it cannot find one and the  name
+       consists  entirely  of digits, PCRE looks for a subpattern of that num-
+       ber, which must be greater than zero. Using subpattern names that  con-
        sist entirely of digits is not recommended.


        Rewriting the above example to use a named subpattern gives this:


          (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )


-       If the name used in a condition of this kind is a duplicate,  the  test
-       is  applied to all subpatterns of the same name, and is true if any one
+       If  the  name used in a condition of this kind is a duplicate, the test
+       is applied to all subpatterns of the same name, and is true if any  one
        of them has matched.


    Checking for pattern recursion


        If the condition is the string (R), and there is no subpattern with the
-       name  R, the condition is true if a recursive call to the whole pattern
+       name R, the condition is true if a recursive call to the whole  pattern
        or any subpattern has been made. If digits or a name preceded by amper-
        sand follow the letter R, for example:


@@ -5206,51 +5226,51 @@

        the condition is true if the most recent recursion is into a subpattern
        whose number or name is given. This condition does not check the entire
-       recursion  stack.  If  the  name  used in a condition of this kind is a
+       recursion stack. If the name used in a condition  of  this  kind  is  a
        duplicate, the test is applied to all subpatterns of the same name, and
        is true if any one of them is the most recent recursion.


-       At  "top  level",  all  these recursion test conditions are false.  The
+       At "top level", all these recursion test  conditions  are  false.   The
        syntax for recursive patterns is described below.


    Defining subpatterns for use by reference only


-       If the condition is the string (DEFINE), and  there  is  no  subpattern
-       with  the  name  DEFINE,  the  condition is always false. In this case,
-       there may be only one alternative  in  the  subpattern.  It  is  always
-       skipped  if  control  reaches  this  point  in the pattern; the idea of
-       DEFINE is that it can be used to define subroutines that can be  refer-
-       enced  from elsewhere. (The use of subroutines is described below.) For
-       example, a pattern to match an IPv4 address  such  as  "192.168.23.245"
+       If  the  condition  is  the string (DEFINE), and there is no subpattern
+       with the name DEFINE, the condition is  always  false.  In  this  case,
+       there  may  be  only  one  alternative  in the subpattern. It is always
+       skipped if control reaches this point  in  the  pattern;  the  idea  of
+       DEFINE  is that it can be used to define subroutines that can be refer-
+       enced from elsewhere. (The use of subroutines is described below.)  For
+       example,  a  pattern  to match an IPv4 address such as "192.168.23.245"
        could be written like this (ignore whitespace and line breaks):


          (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
          \b (?&byte) (\.(?&byte)){3} \b


-       The  first part of the pattern is a DEFINE group inside which a another
-       group named "byte" is defined. This matches an individual component  of
-       an  IPv4  address  (a number less than 256). When matching takes place,
-       this part of the pattern is skipped because DEFINE acts  like  a  false
-       condition.  The  rest of the pattern uses references to the named group
-       to match the four dot-separated components of an IPv4 address,  insist-
+       The first part of the pattern is a DEFINE group inside which a  another
+       group  named "byte" is defined. This matches an individual component of
+       an IPv4 address (a number less than 256). When  matching  takes  place,
+       this  part  of  the pattern is skipped because DEFINE acts like a false
+       condition. The rest of the pattern uses references to the  named  group
+       to  match the four dot-separated components of an IPv4 address, insist-
        ing on a word boundary at each end.


    Assertion conditions


-       If  the  condition  is  not  in any of the above formats, it must be an
-       assertion.  This may be a positive or negative lookahead or  lookbehind
-       assertion.  Consider  this  pattern,  again  containing non-significant
+       If the condition is not in any of the above  formats,  it  must  be  an
+       assertion.   This may be a positive or negative lookahead or lookbehind
+       assertion. Consider  this  pattern,  again  containing  non-significant
        white space, and with the two alternatives on the second line:


          (?(?=[^a-z]*[a-z])
          \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )


-       The condition  is  a  positive  lookahead  assertion  that  matches  an
-       optional  sequence of non-letters followed by a letter. In other words,
-       it tests for the presence of at least one letter in the subject.  If  a
-       letter  is found, the subject is matched against the first alternative;
-       otherwise it is  matched  against  the  second.  This  pattern  matches
-       strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
+       The  condition  is  a  positive  lookahead  assertion  that  matches an
+       optional sequence of non-letters followed by a letter. In other  words,
+       it  tests  for the presence of at least one letter in the subject. If a
+       letter is found, the subject is matched against the first  alternative;
+       otherwise  it  is  matched  against  the  second.  This pattern matches
+       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
        letters and dd are digits.



@@ -5259,41 +5279,41 @@
        There are two ways of including comments in patterns that are processed
        by PCRE. In both cases, the start of the comment must not be in a char-
        acter class, nor in the middle of any other sequence of related charac-
-       ters  such  as  (?: or a subpattern name or number. The characters that
+       ters such as (?: or a subpattern name or number.  The  characters  that
        make up a comment play no part in the pattern matching.


-       The sequence (?# marks the start of a comment that continues up to  the
-       next  closing parenthesis. Nested parentheses are not permitted. If the
+       The  sequence (?# marks the start of a comment that continues up to the
+       next closing parenthesis. Nested parentheses are not permitted. If  the
        PCRE_EXTENDED option is set, an unescaped # character also introduces a
-       comment,  which  in  this  case continues to immediately after the next
-       newline character or character sequence in the pattern.  Which  charac-
+       comment, which in this case continues to  immediately  after  the  next
+       newline  character  or character sequence in the pattern. Which charac-
        ters are interpreted as newlines is controlled by the options passed to
        pcre_compile() or by a special sequence at the start of the pattern, as
-       described  in  the  section  entitled "Newline conventions" above. Note
-       that the end of this type of comment is a literal newline  sequence  in
+       described in the section entitled  "Newline  conventions"  above.  Note
+       that  the  end of this type of comment is a literal newline sequence in
        the pattern; escape sequences that happen to represent a newline do not
-       count. For example, consider this pattern when  PCRE_EXTENDED  is  set,
+       count.  For  example,  consider this pattern when PCRE_EXTENDED is set,
        and the default newline convention is in force:


          abc #comment \n still comment


-       On  encountering  the  # character, pcre_compile() skips along, looking
-       for a newline in the pattern. The sequence \n is still literal at  this
-       stage,  so  it does not terminate the comment. Only an actual character
+       On encountering the # character, pcre_compile()  skips  along,  looking
+       for  a newline in the pattern. The sequence \n is still literal at this
+       stage, so it does not terminate the comment. Only an  actual  character
        with the code value 0x0a (the default newline) does so.



RECURSIVE PATTERNS

-       Consider the problem of matching a string in parentheses, allowing  for
-       unlimited  nested  parentheses.  Without the use of recursion, the best
-       that can be done is to use a pattern that  matches  up  to  some  fixed
-       depth  of  nesting.  It  is not possible to handle an arbitrary nesting
+       Consider  the problem of matching a string in parentheses, allowing for
+       unlimited nested parentheses. Without the use of  recursion,  the  best
+       that  can  be  done  is  to use a pattern that matches up to some fixed
+       depth of nesting. It is not possible to  handle  an  arbitrary  nesting
        depth.


        For some time, Perl has provided a facility that allows regular expres-
-       sions  to recurse (amongst other things). It does this by interpolating
-       Perl code in the expression at run time, and the code can refer to  the
+       sions to recurse (amongst other things). It does this by  interpolating
+       Perl  code in the expression at run time, and the code can refer to the
        expression itself. A Perl pattern using code interpolation to solve the
        parentheses problem can be created like this:


@@ -5303,201 +5323,201 @@
        refers recursively to the pattern in which it appears.


        Obviously, PCRE cannot support the interpolation of Perl code. Instead,
-       it supports special syntax for recursion of  the  entire  pattern,  and
-       also  for  individual  subpattern  recursion. After its introduction in
-       PCRE and Python, this kind of  recursion  was  subsequently  introduced
+       it  supports  special  syntax  for recursion of the entire pattern, and
+       also for individual subpattern recursion.  After  its  introduction  in
+       PCRE  and  Python,  this  kind of recursion was subsequently introduced
        into Perl at release 5.10.


-       A  special  item  that consists of (? followed by a number greater than
-       zero and a closing parenthesis is a recursive subroutine  call  of  the
-       subpattern  of  the  given  number, provided that it occurs inside that
-       subpattern. (If not, it is a non-recursive subroutine  call,  which  is
-       described  in  the  next  section.)  The special item (?R) or (?0) is a
+       A special item that consists of (? followed by a  number  greater  than
+       zero  and  a  closing parenthesis is a recursive subroutine call of the
+       subpattern of the given number, provided that  it  occurs  inside  that
+       subpattern.  (If  not,  it is a non-recursive subroutine call, which is
+       described in the next section.) The special item  (?R)  or  (?0)  is  a
        recursive call of the entire regular expression.


-       This PCRE pattern solves the nested  parentheses  problem  (assume  the
+       This  PCRE  pattern  solves  the nested parentheses problem (assume the
        PCRE_EXTENDED option is set so that white space is ignored):


          \( ( [^()]++ | (?R) )* \)


-       First  it matches an opening parenthesis. Then it matches any number of
-       substrings which can either be a  sequence  of  non-parentheses,  or  a
-       recursive  match  of the pattern itself (that is, a correctly parenthe-
+       First it matches an opening parenthesis. Then it matches any number  of
+       substrings  which  can  either  be  a sequence of non-parentheses, or a
+       recursive match of the pattern itself (that is, a  correctly  parenthe-
        sized substring).  Finally there is a closing parenthesis. Note the use
        of a possessive quantifier to avoid backtracking into sequences of non-
        parentheses.


-       If this were part of a larger pattern, you would not  want  to  recurse
+       If  this  were  part of a larger pattern, you would not want to recurse
        the entire pattern, so instead you could use this:


          ( \( ( [^()]++ | (?1) )* \) )


-       We  have  put the pattern into parentheses, and caused the recursion to
+       We have put the pattern into parentheses, and caused the  recursion  to
        refer to them instead of the whole pattern.


-       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
-       tricky.  This is made easier by the use of relative references. Instead
+       In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
+       tricky. This is made easier by the use of relative references.  Instead
        of (?1) in the pattern above you can write (?-2) to refer to the second
-       most  recently  opened  parentheses  preceding  the recursion. In other
-       words, a negative number counts capturing  parentheses  leftwards  from
+       most recently opened parentheses  preceding  the  recursion.  In  other
+       words,  a  negative  number counts capturing parentheses leftwards from
        the point at which it is encountered.


-       It  is  also  possible  to refer to subsequently opened parentheses, by
-       writing references such as (?+2). However, these  cannot  be  recursive
-       because  the  reference  is  not inside the parentheses that are refer-
-       enced. They are always non-recursive subroutine calls, as described  in
+       It is also possible to refer to  subsequently  opened  parentheses,  by
+       writing  references  such  as (?+2). However, these cannot be recursive
+       because the reference is not inside the  parentheses  that  are  refer-
+       enced.  They are always non-recursive subroutine calls, as described in
        the next section.


-       An  alternative  approach is to use named parentheses instead. The Perl
-       syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
+       An alternative approach is to use named parentheses instead.  The  Perl
+       syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also
        supported. We could rewrite the above example as follows:


          (?<pn> \( ( [^()]++ | (?&pn) )* \) )


-       If  there  is more than one subpattern with the same name, the earliest
+       If there is more than one subpattern with the same name,  the  earliest
        one is used.


-       This particular example pattern that we have been looking  at  contains
+       This  particular  example pattern that we have been looking at contains
        nested unlimited repeats, and so the use of a possessive quantifier for
        matching strings of non-parentheses is important when applying the pat-
-       tern  to  strings  that do not match. For example, when this pattern is
+       tern to strings that do not match. For example, when  this  pattern  is
        applied to


          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()


-       it yields "no match" quickly. However, if a  possessive  quantifier  is
-       not  used, the match runs for a very long time indeed because there are
-       so many different ways the + and * repeats can carve  up  the  subject,
+       it  yields  "no  match" quickly. However, if a possessive quantifier is
+       not used, the match runs for a very long time indeed because there  are
+       so  many  different  ways the + and * repeats can carve up the subject,
        and all have to be tested before failure can be reported.


-       At  the  end  of a match, the values of capturing parentheses are those
-       from the outermost level. If you want to obtain intermediate values,  a
-       callout  function can be used (see below and the pcrecallout documenta-
+       At the end of a match, the values of capturing  parentheses  are  those
+       from  the outermost level. If you want to obtain intermediate values, a
+       callout function can be used (see below and the pcrecallout  documenta-
        tion). If the pattern above is matched against


          (ab(cd)ef)


-       the value for the inner capturing parentheses  (numbered  2)  is  "ef",
-       which  is the last value taken on at the top level. If a capturing sub-
-       pattern is not matched at the top level, its final  captured  value  is
-       unset,  even  if  it was (temporarily) set at a deeper level during the
+       the  value  for  the  inner capturing parentheses (numbered 2) is "ef",
+       which is the last value taken on at the top level. If a capturing  sub-
+       pattern  is  not  matched at the top level, its final captured value is
+       unset, even if it was (temporarily) set at a deeper  level  during  the
        matching process.


-       If there are more than 15 capturing parentheses in a pattern, PCRE  has
-       to  obtain extra memory to store data during a recursion, which it does
+       If  there are more than 15 capturing parentheses in a pattern, PCRE has
+       to obtain extra memory to store data during a recursion, which it  does
        by using pcre_malloc, freeing it via pcre_free afterwards. If no memory
        can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.


-       Do  not  confuse  the (?R) item with the condition (R), which tests for
-       recursion.  Consider this pattern, which matches text in  angle  brack-
-       ets,  allowing for arbitrary nesting. Only digits are allowed in nested
-       brackets (that is, when recursing), whereas any characters are  permit-
+       Do not confuse the (?R) item with the condition (R),  which  tests  for
+       recursion.   Consider  this pattern, which matches text in angle brack-
+       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
+       brackets  (that is, when recursing), whereas any characters are permit-
        ted at the outer level.


          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >


-       In  this  pattern, (?(R) is the start of a conditional subpattern, with
-       two different alternatives for the recursive and  non-recursive  cases.
+       In this pattern, (?(R) is the start of a conditional  subpattern,  with
+       two  different  alternatives for the recursive and non-recursive cases.
        The (?R) item is the actual recursive call.


    Differences in recursion processing between PCRE and Perl


-       Recursion  processing  in PCRE differs from Perl in two important ways.
-       In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
+       Recursion processing in PCRE differs from Perl in two  important  ways.
+       In  PCRE (like Python, but unlike Perl), a recursive subpattern call is
        always treated as an atomic group. That is, once it has matched some of
        the subject string, it is never re-entered, even if it contains untried
-       alternatives  and  there  is a subsequent matching failure. This can be
-       illustrated by the following pattern, which purports to match a  palin-
-       dromic  string  that contains an odd number of characters (for example,
+       alternatives and there is a subsequent matching failure.  This  can  be
+       illustrated  by the following pattern, which purports to match a palin-
+       dromic string that contains an odd number of characters  (for  example,
        "a", "aba", "abcba", "abcdcba"):


          ^(.|(.)(?1)\2)$


        The idea is that it either matches a single character, or two identical
-       characters  surrounding  a sub-palindrome. In Perl, this pattern works;
-       in PCRE it does not if the pattern is  longer  than  three  characters.
+       characters surrounding a sub-palindrome. In Perl, this  pattern  works;
+       in  PCRE  it  does  not if the pattern is longer than three characters.
        Consider the subject string "abcba":


-       At  the  top level, the first character is matched, but as it is not at
+       At the top level, the first character is matched, but as it is  not  at
        the end of the string, the first alternative fails; the second alterna-
        tive is taken and the recursion kicks in. The recursive call to subpat-
-       tern 1 successfully matches the next character ("b").  (Note  that  the
+       tern  1  successfully  matches the next character ("b"). (Note that the
        beginning and end of line tests are not part of the recursion).


-       Back  at  the top level, the next character ("c") is compared with what
-       subpattern 2 matched, which was "a". This fails. Because the  recursion
-       is  treated  as  an atomic group, there are now no backtracking points,
-       and so the entire match fails. (Perl is able, at  this  point,  to  re-
-       enter  the  recursion  and try the second alternative.) However, if the
+       Back at the top level, the next character ("c") is compared  with  what
+       subpattern  2 matched, which was "a". This fails. Because the recursion
+       is treated as an atomic group, there are now  no  backtracking  points,
+       and  so  the  entire  match fails. (Perl is able, at this point, to re-
+       enter the recursion and try the second alternative.)  However,  if  the
        pattern is written with the alternatives in the other order, things are
        different:


          ^((.)(?1)\2|.)$


-       This  time,  the recursing alternative is tried first, and continues to
-       recurse until it runs out of characters, at which point  the  recursion
-       fails.  But  this  time  we  do  have another alternative to try at the
-       higher level. That is the big difference:  in  the  previous  case  the
+       This time, the recursing alternative is tried first, and  continues  to
+       recurse  until  it runs out of characters, at which point the recursion
+       fails. But this time we do have  another  alternative  to  try  at  the
+       higher  level.  That  is  the  big difference: in the previous case the
        remaining alternative is at a deeper recursion level, which PCRE cannot
        use.


-       To change the pattern so that it matches all palindromic  strings,  not
-       just  those  with an odd number of characters, it is tempting to change
+       To  change  the pattern so that it matches all palindromic strings, not
+       just those with an odd number of characters, it is tempting  to  change
        the pattern to this:


          ^((.)(?1)\2|.?)$


-       Again, this works in Perl, but not in PCRE, and for  the  same  reason.
-       When  a  deeper  recursion has matched a single character, it cannot be
-       entered again in order to match an empty string.  The  solution  is  to
-       separate  the two cases, and write out the odd and even cases as alter-
+       Again,  this  works  in Perl, but not in PCRE, and for the same reason.
+       When a deeper recursion has matched a single character,  it  cannot  be
+       entered  again  in  order  to match an empty string. The solution is to
+       separate the two cases, and write out the odd and even cases as  alter-
        natives at the higher level:


          ^(?:((.)(?1)\2|)|((.)(?3)\4|.))


-       If you want to match typical palindromic phrases, the  pattern  has  to
+       If  you  want  to match typical palindromic phrases, the pattern has to
        ignore all non-word characters, which can be done like this:


          ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$


        If run with the PCRE_CASELESS option, this pattern matches phrases such
        as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
-       Perl.  Note the use of the possessive quantifier *+ to avoid backtrack-
-       ing into sequences of non-word characters. Without this, PCRE  takes  a
-       great  deal  longer  (ten  times or more) to match typical phrases, and
+       Perl. Note the use of the possessive quantifier *+ to avoid  backtrack-
+       ing  into  sequences of non-word characters. Without this, PCRE takes a
+       great deal longer (ten times or more) to  match  typical  phrases,  and
        Perl takes so long that you think it has gone into a loop.


-       WARNING: The palindrome-matching patterns above work only if  the  sub-
-       ject  string  does not start with a palindrome that is shorter than the
-       entire string.  For example, although "abcba" is correctly matched,  if
-       the  subject  is "ababa", PCRE finds the palindrome "aba" at the start,
-       then fails at top level because the end of the string does not  follow.
-       Once  again, it cannot jump back into the recursion to try other alter-
+       WARNING:  The  palindrome-matching patterns above work only if the sub-
+       ject string does not start with a palindrome that is shorter  than  the
+       entire  string.  For example, although "abcba" is correctly matched, if
+       the subject is "ababa", PCRE finds the palindrome "aba" at  the  start,
+       then  fails at top level because the end of the string does not follow.
+       Once again, it cannot jump back into the recursion to try other  alter-
        natives, so the entire match fails.


-       The second way in which PCRE and Perl differ in  their  recursion  pro-
-       cessing  is in the handling of captured values. In Perl, when a subpat-
-       tern is called recursively or as a subpattern (see the  next  section),
-       it  has  no  access to any values that were captured outside the recur-
-       sion, whereas in PCRE these values can  be  referenced.  Consider  this
+       The  second  way  in which PCRE and Perl differ in their recursion pro-
+       cessing is in the handling of captured values. In Perl, when a  subpat-
+       tern  is  called recursively or as a subpattern (see the next section),
+       it has no access to any values that were captured  outside  the  recur-
+       sion,  whereas  in  PCRE  these values can be referenced. Consider this
        pattern:


          ^(.)(\1|a(?2))


-       In  PCRE,  this  pattern matches "bab". The first capturing parentheses
-       match "b", then in the second group, when the back reference  \1  fails
-       to  match "b", the second alternative matches "a" and then recurses. In
-       the recursion, \1 does now match "b" and so the whole  match  succeeds.
-       In  Perl,  the pattern fails to match because inside the recursive call
+       In PCRE, this pattern matches "bab". The  first  capturing  parentheses
+       match  "b",  then in the second group, when the back reference \1 fails
+       to match "b", the second alternative matches "a" and then recurses.  In
+       the  recursion,  \1 does now match "b" and so the whole match succeeds.
+       In Perl, the pattern fails to match because inside the  recursive  call
        \1 cannot access the externally set value.



SUBPATTERNS AS SUBROUTINES

-       If the syntax for a recursive subpattern call (either by number  or  by
-       name)  is  used outside the parentheses to which it refers, it operates
-       like a subroutine in a programming language. The called subpattern  may
-       be  defined  before or after the reference. A numbered reference can be
+       If  the  syntax for a recursive subpattern call (either by number or by
+       name) is used outside the parentheses to which it refers,  it  operates
+       like  a subroutine in a programming language. The called subpattern may
+       be defined before or after the reference. A numbered reference  can  be
        absolute or relative, as in these examples:


          (...(absolute)...)...(?2)...
@@ -5508,179 +5528,179 @@


          (sens|respons)e and \1ibility


-       matches "sense and sensibility" and "response and responsibility",  but
+       matches  "sense and sensibility" and "response and responsibility", but
        not "sense and responsibility". If instead the pattern


          (sens|respons)e and (?1)ibility


-       is  used, it does match "sense and responsibility" as well as the other
-       two strings. Another example is  given  in  the  discussion  of  DEFINE
+       is used, it does match "sense and responsibility" as well as the  other
+       two  strings.  Another  example  is  given  in the discussion of DEFINE
        above.


-       All  subroutine  calls, whether recursive or not, are always treated as
-       atomic groups. That is, once a subroutine has matched some of the  sub-
+       All subroutine calls, whether recursive or not, are always  treated  as
+       atomic  groups. That is, once a subroutine has matched some of the sub-
        ject string, it is never re-entered, even if it contains untried alter-
-       natives and there is  a  subsequent  matching  failure.  Any  capturing
-       parentheses  that  are  set  during the subroutine call revert to their
+       natives  and  there  is  a  subsequent  matching failure. Any capturing
+       parentheses that are set during the subroutine  call  revert  to  their
        previous values afterwards.


-       Processing options such as case-independence are fixed when  a  subpat-
-       tern  is defined, so if it is used as a subroutine, such options cannot
+       Processing  options  such as case-independence are fixed when a subpat-
+       tern is defined, so if it is used as a subroutine, such options  cannot
        be changed for different calls. For example, consider this pattern:


          (abc)(?i:(?-1))


-       It matches "abcabc". It does not match "abcABC" because the  change  of
+       It  matches  "abcabc". It does not match "abcABC" because the change of
        processing option does not affect the called subpattern.



ONIGURUMA SUBROUTINE SYNTAX

-       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
+       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
        name or a number enclosed either in angle brackets or single quotes, is
-       an  alternative  syntax  for  referencing a subpattern as a subroutine,
-       possibly recursively. Here are two of the examples used above,  rewrit-
+       an alternative syntax for referencing a  subpattern  as  a  subroutine,
+       possibly  recursively. Here are two of the examples used above, rewrit-
        ten using this syntax:


          (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
          (sens|respons)e and \g'1'ibility


-       PCRE  supports  an extension to Oniguruma: if a number is preceded by a
+       PCRE supports an extension to Oniguruma: if a number is preceded  by  a
        plus or a minus sign it is taken as a relative reference. For example:


          (abc)(?i:\g<-1>)


-       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
-       synonymous.  The former is a back reference; the latter is a subroutine
+       Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
+       synonymous. The former is a back reference; the latter is a  subroutine
        call.



CALLOUTS

        Perl has a feature whereby using the sequence (?{...}) causes arbitrary
-       Perl  code to be obeyed in the middle of matching a regular expression.
+       Perl code to be obeyed in the middle of matching a regular  expression.
        This makes it possible, amongst other things, to extract different sub-
        strings that match the same pair of parentheses when there is a repeti-
        tion.


        PCRE provides a similar feature, but of course it cannot obey arbitrary
        Perl code. The feature is called "callout". The caller of PCRE provides
-       an external function by putting its entry point in the global  variable
-       pcre_callout.   By default, this variable contains NULL, which disables
+       an  external function by putting its entry point in the global variable
+       pcre_callout.  By default, this variable contains NULL, which  disables
        all calling out.


-       Within a regular expression, (?C) indicates the  points  at  which  the
-       external  function  is  to be called. If you want to identify different
-       callout points, you can put a number less than 256 after the letter  C.
-       The  default  value is zero.  For example, this pattern has two callout
+       Within  a  regular  expression,  (?C) indicates the points at which the
+       external function is to be called. If you want  to  identify  different
+       callout  points, you can put a number less than 256 after the letter C.
+       The default value is zero.  For example, this pattern has  two  callout
        points:


          (?C1)abc(?C2)def


        If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
-       automatically  installed  before each item in the pattern. They are all
+       automatically installed before each item in the pattern. They  are  all
        numbered 255.


        During matching, when PCRE reaches a callout point (and pcre_callout is
-       set),  the  external function is called. It is provided with the number
-       of the callout, the position in the pattern, and, optionally, one  item
-       of  data  originally supplied by the caller of pcre_exec(). The callout
-       function may cause matching to proceed, to backtrack, or to fail  alto-
+       set), the external function is called. It is provided with  the  number
+       of  the callout, the position in the pattern, and, optionally, one item
+       of data originally supplied by the caller of pcre_exec().  The  callout
+       function  may cause matching to proceed, to backtrack, or to fail alto-
        gether. A complete description of the interface to the callout function
        is given in the pcrecallout documentation.



BACKTRACKING CONTROL

-       Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
+       Perl  5.10 introduced a number of "Special Backtracking Control Verbs",
        which are described in the Perl documentation as "experimental and sub-
-       ject to change or removal in a future version of Perl". It goes  on  to
-       say:  "Their usage in production code should be noted to avoid problems
+       ject  to  change or removal in a future version of Perl". It goes on to
+       say: "Their usage in production code should be noted to avoid  problems
        during upgrades." The same remarks apply to the PCRE features described
        in this section.


-       Since  these  verbs  are  specifically related to backtracking, most of
-       them can be  used  only  when  the  pattern  is  to  be  matched  using
+       Since these verbs are specifically related  to  backtracking,  most  of
+       them  can  be  used  only  when  the  pattern  is  to  be matched using
        pcre_exec(), which uses a backtracking algorithm. With the exception of
        (*FAIL), which behaves like a failing negative assertion, they cause an
        error if encountered by pcre_dfa_exec().


-       If  any of these verbs are used in an assertion or in a subpattern that
+       If any of these verbs are used in an assertion or in a subpattern  that
        is called as a subroutine (whether or not recursively), their effect is
        confined to that subpattern; it does not extend to the surrounding pat-
-       tern, with one exception: a *MARK that is  encountered  in  a  positive
+       tern,  with  one  exception:  a *MARK that is encountered in a positive
        assertion is passed back (compare capturing parentheses in assertions).
        Note that such subpatterns are processed as anchored at the point where
        they are tested. Note also that Perl's treatment of subroutines is dif-
        ferent in some cases.


-       The new verbs make use of what was previously invalid syntax: an  open-
+       The  new verbs make use of what was previously invalid syntax: an open-
        ing parenthesis followed by an asterisk. They are generally of the form
-       (*VERB) or (*VERB:NAME). Some may take either form, with differing  be-
-       haviour,  depending on whether or not an argument is present. A name is
+       (*VERB)  or (*VERB:NAME). Some may take either form, with differing be-
+       haviour, depending on whether or not an argument is present. A name  is
        any sequence of characters that does not include a closing parenthesis.
-       If  the  name is empty, that is, if the closing parenthesis immediately
-       follows the colon, the effect is as if the colon were  not  there.  Any
+       If the name is empty, that is, if the closing  parenthesis  immediately
+       follows  the  colon,  the effect is as if the colon were not there. Any
        number of these verbs may occur in a pattern.


-       PCRE  contains some optimizations that are used to speed up matching by
+       PCRE contains some optimizations that are used to speed up matching  by
        running some checks at the start of each match attempt. For example, it
-       may  know  the minimum length of matching subject, or that a particular
-       character must be present. When one of these  optimizations  suppresses
-       the  running  of  a match, any included backtracking verbs will not, of
+       may know the minimum length of matching subject, or that  a  particular
+       character  must  be present. When one of these optimizations suppresses
+       the running of a match, any included backtracking verbs  will  not,  of
        course, be processed. You can suppress the start-of-match optimizations
-       by  setting  the  PCRE_NO_START_OPTIMIZE  option when calling pcre_com-
+       by setting the PCRE_NO_START_OPTIMIZE  option  when  calling  pcre_com-
        pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).


    Verbs that act immediately


-       The following verbs act as soon as they are encountered. They  may  not
+       The  following  verbs act as soon as they are encountered. They may not
        be followed by a name.


           (*ACCEPT)


-       This  verb causes the match to end successfully, skipping the remainder
-       of the pattern. However, when it is inside a subpattern that is  called
-       as  a  subroutine, only that subpattern is ended successfully. Matching
-       then continues at the outer level. If  (*ACCEPT)  is  inside  capturing
+       This verb causes the match to end successfully, skipping the  remainder
+       of  the pattern. However, when it is inside a subpattern that is called
+       as a subroutine, only that subpattern is ended  successfully.  Matching
+       then  continues  at  the  outer level. If (*ACCEPT) is inside capturing
        parentheses, the data so far is captured. For example:


          A((?:A|B(*ACCEPT)|C)D)


-       This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
+       This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
        tured by the outer parentheses.


          (*FAIL) or (*F)


-       This verb causes a matching failure, forcing backtracking to occur.  It
-       is  equivalent to (?!) but easier to read. The Perl documentation notes
-       that it is probably useful only when combined  with  (?{})  or  (??{}).
-       Those  are,  of course, Perl features that are not present in PCRE. The
-       nearest equivalent is the callout feature, as for example in this  pat-
+       This  verb causes a matching failure, forcing backtracking to occur. It
+       is equivalent to (?!) but easier to read. The Perl documentation  notes
+       that  it  is  probably  useful only when combined with (?{}) or (??{}).
+       Those are, of course, Perl features that are not present in  PCRE.  The
+       nearest  equivalent is the callout feature, as for example in this pat-
        tern:


          a+(?C)(*FAIL)


-       A  match  with the string "aaaa" always fails, but the callout is taken
+       A match with the string "aaaa" always fails, but the callout  is  taken
        before each backtrack happens (in this example, 10 times).


    Recording which path was taken


-       There is one verb whose main purpose  is  to  track  how  a  match  was
-       arrived  at,  though  it  also  has a secondary use in conjunction with
+       There  is  one  verb  whose  main  purpose  is to track how a match was
+       arrived at, though it also has a  secondary  use  in  conjunction  with
        advancing the match starting point (see (*SKIP) below).


          (*MARK:NAME) or (*:NAME)


-       A name is always  required  with  this  verb.  There  may  be  as  many
-       instances  of  (*MARK) as you like in a pattern, and their names do not
+       A  name  is  always  required  with  this  verb.  There  may be as many
+       instances of (*MARK) as you like in a pattern, and their names  do  not
        have to be unique.


-       When a match succeeds, the name  of  the  last-encountered  (*MARK)  is
-       passed  back  to  the  caller  via  the  pcre_extra  data structure, as
+       When  a  match  succeeds,  the  name of the last-encountered (*MARK) is
+       passed back to  the  caller  via  the  pcre_extra  data  structure,  as
        described in the section on pcre_extra in the pcreapi documentation. No
-       data  is  returned  for a partial match. Here is an example of pcretest
-       output, where the /K modifier requests the retrieval and outputting  of
+       data is returned for a partial match. Here is an  example  of  pcretest
+       output,  where the /K modifier requests the retrieval and outputting of
        (*MARK) data:


          /X(*MARK:A)Y|X(*MARK:B)Z/K
@@ -5692,17 +5712,17 @@
          MK: B


        The (*MARK) name is tagged with "MK:" in this output, and in this exam-
-       ple it indicates which of the two alternatives matched. This is a  more
-       efficient  way of obtaining this information than putting each alterna-
+       ple  it indicates which of the two alternatives matched. This is a more
+       efficient way of obtaining this information than putting each  alterna-
        tive in its own capturing parentheses.


        If (*MARK) is encountered in a positive assertion, its name is recorded
        and passed back if it is the last-encountered. This does not happen for
        negative assertions.


-       A name may also be returned after a failed  match  if  the  final  path
-       through  the  pattern involves (*MARK). However, unless (*MARK) used in
-       conjunction with (*COMMIT), this is unlikely to  happen  for  an  unan-
+       A  name  may  also  be  returned after a failed match if the final path
+       through the pattern involves (*MARK). However, unless (*MARK)  used  in
+       conjunction  with  (*COMMIT),  this  is unlikely to happen for an unan-
        chored pattern because, as the starting point for matching is advanced,
        the final check is often with an empty string, causing a failure before
        (*MARK) is reached. For example:
@@ -5712,56 +5732,56 @@
          No match


        There are three potential starting points for this match (starting with
-       X, starting with P, and with  an  empty  string).  If  the  pattern  is
+       X,  starting  with  P,  and  with  an  empty string). If the pattern is
        anchored, the result is different:


          /^X(*MARK:A)Y|^X(*MARK:B)Z/K
          XP
          No match, mark = B


-       PCRE's  start-of-match  optimizations can also interfere with this. For
-       example, if, as a result of a call to pcre_study(), it knows the  mini-
-       mum  subject  length for a match, a shorter subject will not be scanned
+       PCRE's start-of-match optimizations can also interfere with  this.  For
+       example,  if, as a result of a call to pcre_study(), it knows the mini-
+       mum subject length for a match, a shorter subject will not  be  scanned
        at all.


        Note that similar anomalies (though different in detail) exist in Perl,
-       no  doubt  for the same reasons. The use of (*MARK) data after a failed
-       match of an unanchored pattern is not recommended, unless (*COMMIT)  is
+       no doubt for the same reasons. The use of (*MARK) data after  a  failed
+       match  of an unanchored pattern is not recommended, unless (*COMMIT) is
        involved.


    Verbs that act after backtracking


        The following verbs do nothing when they are encountered. Matching con-
-       tinues with what follows, but if there is no subsequent match,  causing
-       a  backtrack  to  the  verb, a failure is forced. That is, backtracking
-       cannot pass to the left of the verb. However, when one of  these  verbs
-       appears  inside  an atomic group, its effect is confined to that group,
-       because once the group has been matched, there is never any  backtrack-
-       ing  into  it.  In  this situation, backtracking can "jump back" to the
-       left of the entire atomic group. (Remember also, as stated above,  that
+       tinues  with what follows, but if there is no subsequent match, causing
+       a backtrack to the verb, a failure is  forced.  That  is,  backtracking
+       cannot  pass  to the left of the verb. However, when one of these verbs
+       appears inside an atomic group, its effect is confined to  that  group,
+       because  once the group has been matched, there is never any backtrack-
+       ing into it. In this situation, backtracking can  "jump  back"  to  the
+       left  of the entire atomic group. (Remember also, as stated above, that
        this localization also applies in subroutine calls and assertions.)


-       These  verbs  differ  in exactly what kind of failure occurs when back-
+       These verbs differ in exactly what kind of failure  occurs  when  back-
        tracking reaches them.


          (*COMMIT)


-       This verb, which may not be followed by a name, causes the whole  match
+       This  verb, which may not be followed by a name, causes the whole match
        to fail outright if the rest of the pattern does not match. Even if the
        pattern is unanchored, no further attempts to find a match by advancing
        the  starting  point  take  place.  Once  (*COMMIT)  has  been  passed,
-       pcre_exec() is committed to finding a match  at  the  current  starting
+       pcre_exec()  is  committed  to  finding a match at the current starting
        point, or not at all. For example:


          a+(*COMMIT)b


-       This  matches  "xxaab" but not "aacaab". It can be thought of as a kind
+       This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
        of dynamic anchor, or "I've started, so I must finish." The name of the
-       most  recently passed (*MARK) in the path is passed back when (*COMMIT)
+       most recently passed (*MARK) in the path is passed back when  (*COMMIT)
        forces a match failure.


-       Note that (*COMMIT) at the start of a pattern is not  the  same  as  an
-       anchor,  unless  PCRE's start-of-match optimizations are turned off, as
+       Note  that  (*COMMIT)  at  the start of a pattern is not the same as an
+       anchor, unless PCRE's start-of-match optimizations are turned  off,  as
        shown in this pcretest example:


          /(*COMMIT)abc/
@@ -5770,115 +5790,115 @@
          xyzabc\Y
          No match


-       PCRE knows that any match must start  with  "a",  so  the  optimization
-       skips  along the subject to "a" before running the first match attempt,
-       which succeeds. When the optimization is disabled by the \Y  escape  in
+       PCRE  knows  that  any  match  must start with "a", so the optimization
+       skips along the subject to "a" before running the first match  attempt,
+       which  succeeds.  When the optimization is disabled by the \Y escape in
        the second subject, the match starts at "x" and so the (*COMMIT) causes
        it to fail without trying any other starting points.


          (*PRUNE) or (*PRUNE:NAME)


-       This verb causes the match to fail at the current starting position  in
-       the  subject  if the rest of the pattern does not match. If the pattern
-       is unanchored, the normal "bumpalong"  advance  to  the  next  starting
-       character  then happens. Backtracking can occur as usual to the left of
-       (*PRUNE), before it is reached,  or  when  matching  to  the  right  of
-       (*PRUNE),  but  if  there is no match to the right, backtracking cannot
-       cross (*PRUNE). In simple cases, the use of (*PRUNE) is just an  alter-
-       native  to an atomic group or possessive quantifier, but there are some
+       This  verb causes the match to fail at the current starting position in
+       the subject if the rest of the pattern does not match. If  the  pattern
+       is  unanchored,  the  normal  "bumpalong"  advance to the next starting
+       character then happens. Backtracking can occur as usual to the left  of
+       (*PRUNE),  before  it  is  reached,  or  when  matching to the right of
+       (*PRUNE), but if there is no match to the  right,  backtracking  cannot
+       cross  (*PRUNE). In simple cases, the use of (*PRUNE) is just an alter-
+       native to an atomic group or possessive quantifier, but there are  some
        uses of (*PRUNE) that cannot be expressed in any other way.  The behav-
-       iour  of  (*PRUNE:NAME)  is  the  same as (*MARK:NAME)(*PRUNE) when the
-       match fails completely; the name is passed back if this  is  the  final
-       attempt.   (*PRUNE:NAME)  does  not  pass back a name if the match suc-
-       ceeds. In an anchored pattern (*PRUNE) has the same  effect  as  (*COM-
+       iour of (*PRUNE:NAME) is the  same  as  (*MARK:NAME)(*PRUNE)  when  the
+       match  fails  completely;  the name is passed back if this is the final
+       attempt.  (*PRUNE:NAME) does not pass back a name  if  the  match  suc-
+       ceeds.  In  an  anchored pattern (*PRUNE) has the same effect as (*COM-
        MIT).


          (*SKIP)


-       This  verb, when given without a name, is like (*PRUNE), except that if
-       the pattern is unanchored, the "bumpalong" advance is not to  the  next
+       This verb, when given without a name, is like (*PRUNE), except that  if
+       the  pattern  is unanchored, the "bumpalong" advance is not to the next
        character, but to the position in the subject where (*SKIP) was encoun-
-       tered. (*SKIP) signifies that whatever text was matched leading  up  to
+       tered.  (*SKIP)  signifies that whatever text was matched leading up to
        it cannot be part of a successful match. Consider:


          a+(*SKIP)b


-       If  the  subject  is  "aaaac...",  after  the first match attempt fails
-       (starting at the first character in the  string),  the  starting  point
+       If the subject is "aaaac...",  after  the  first  match  attempt  fails
+       (starting  at  the  first  character in the string), the starting point
        skips on to start the next attempt at "c". Note that a possessive quan-
-       tifer does not have the same effect as this example; although it  would
-       suppress  backtracking  during  the  first  match  attempt,  the second
-       attempt would start at the second character instead of skipping  on  to
+       tifer  does not have the same effect as this example; although it would
+       suppress backtracking  during  the  first  match  attempt,  the  second
+       attempt  would  start at the second character instead of skipping on to
        "c".


          (*SKIP:NAME)


-       When  (*SKIP) has an associated name, its behaviour is modified. If the
+       When (*SKIP) has an associated name, its behaviour is modified. If  the
        following pattern fails to match, the previous path through the pattern
-       is  searched for the most recent (*MARK) that has the same name. If one
-       is found, the "bumpalong" advance is to the subject position that  cor-
-       responds  to  that (*MARK) instead of to where (*SKIP) was encountered.
-       If no (*MARK) with a matching name is found, normal "bumpalong" of  one
+       is searched for the most recent (*MARK) that has the same name. If  one
+       is  found, the "bumpalong" advance is to the subject position that cor-
+       responds to that (*MARK) instead of to where (*SKIP)  was  encountered.
+       If  no (*MARK) with a matching name is found, normal "bumpalong" of one
        character happens (that is, the (*SKIP) is ignored).


          (*THEN) or (*THEN:NAME)


-       This  verb  causes a skip to the next innermost alternative if the rest
-       of the pattern does not match. That is, it cancels  pending  backtrack-
-       ing,  but  only within the current alternative. Its name comes from the
+       This verb causes a skip to the next innermost alternative if  the  rest
+       of  the  pattern does not match. That is, it cancels pending backtrack-
+       ing, but only within the current alternative. Its name comes  from  the
        observation that it can be used for a pattern-based if-then-else block:


          ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...


-       If the COND1 pattern matches, FOO is tried (and possibly further  items
-       after  the  end  of the group if FOO succeeds); on failure, the matcher
-       skips to the second alternative and tries COND2,  without  backtracking
-       into  COND1.  The  behaviour  of  (*THEN:NAME)  is  exactly the same as
-       (*MARK:NAME)(*THEN) if the overall  match  fails.  If  (*THEN)  is  not
+       If  the COND1 pattern matches, FOO is tried (and possibly further items
+       after the end of the group if FOO succeeds); on  failure,  the  matcher
+       skips  to  the second alternative and tries COND2, without backtracking
+       into COND1. The behaviour  of  (*THEN:NAME)  is  exactly  the  same  as
+       (*MARK:NAME)(*THEN)  if  the  overall  match  fails.  If (*THEN) is not
        inside an alternation, it acts like (*PRUNE).


-       Note  that  a  subpattern that does not contain a | character is just a
-       part of the enclosing alternative; it is not a nested alternation  with
-       only  one alternative. The effect of (*THEN) extends beyond such a sub-
-       pattern to the enclosing alternative. Consider this pattern,  where  A,
+       Note that a subpattern that does not contain a | character  is  just  a
+       part  of the enclosing alternative; it is not a nested alternation with
+       only one alternative. The effect of (*THEN) extends beyond such a  sub-
+       pattern  to  the enclosing alternative. Consider this pattern, where A,
        B, etc. are complex pattern fragments that do not contain any | charac-
        ters at this level:


          A (B(*THEN)C) | D


-       If A and B are matched, but there is a failure in C, matching does  not
+       If  A and B are matched, but there is a failure in C, matching does not
        backtrack into A; instead it moves to the next alternative, that is, D.
-       However, if the subpattern containing (*THEN) is given an  alternative,
+       However,  if the subpattern containing (*THEN) is given an alternative,
        it behaves differently:


          A (B(*THEN)C | (*FAIL)) | D


-       The  effect of (*THEN) is now confined to the inner subpattern. After a
+       The effect of (*THEN) is now confined to the inner subpattern. After  a
        failure in C, matching moves to (*FAIL), which causes the whole subpat-
-       tern  to  fail  because  there are no more alternatives to try. In this
+       tern to fail because there are no more alternatives  to  try.  In  this
        case, matching does now backtrack into A.


        Note also that a conditional subpattern is not considered as having two
-       alternatives,  because  only  one  is  ever used. In other words, the |
+       alternatives, because only one is ever used.  In  other  words,  the  |
        character in a conditional subpattern has a different meaning. Ignoring
        white space, consider:


          ^.*? (?(?=a) a | b(*THEN)c )


-       If  the  subject  is  "ba", this pattern does not match. Because .*? is
-       ungreedy, it initially matches zero  characters.  The  condition  (?=a)
-       then  fails,  the  character  "b"  is  matched, but "c" is not. At this
-       point, matching does not backtrack to .*? as might perhaps be  expected
-       from  the  presence  of  the | character. The conditional subpattern is
+       If the subject is "ba", this pattern does not  match.  Because  .*?  is
+       ungreedy,  it  initially  matches  zero characters. The condition (?=a)
+       then fails, the character "b" is matched,  but  "c"  is  not.  At  this
+       point,  matching does not backtrack to .*? as might perhaps be expected
+       from the presence of the | character.  The  conditional  subpattern  is
        part of the single alternative that comprises the whole pattern, and so
-       the  match  fails.  (If  there was a backtrack into .*?, allowing it to
+       the match fails. (If there was a backtrack into  .*?,  allowing  it  to
        match "b", the match would succeed.)


-       The verbs just described provide four different "strengths" of  control
+       The  verbs just described provide four different "strengths" of control
        when subsequent matching fails. (*THEN) is the weakest, carrying on the
-       match at the next alternative. (*PRUNE) comes next, failing  the  match
-       at  the  current starting position, but allowing an advance to the next
-       character (for an unanchored pattern). (*SKIP) is similar, except  that
+       match  at  the next alternative. (*PRUNE) comes next, failing the match
+       at the current starting position, but allowing an advance to  the  next
+       character  (for an unanchored pattern). (*SKIP) is similar, except that
        the advance may be more than one character. (*COMMIT) is the strongest,
        causing the entire match to fail.


@@ -5888,8 +5908,8 @@

          (A(*COMMIT)B(*THEN)C|D)


-       Once A has matched, PCRE is committed to this  match,  at  the  current
-       starting  position. If subsequently B matches, but C does not, the nor-
+       Once  A  has  matched,  PCRE is committed to this match, at the current
+       starting position. If subsequently B matches, but C does not, the  nor-
        mal (*THEN) action of trying the next alternative (that is, D) does not
        happen because (*COMMIT) overrides.


@@ -5908,7 +5928,7 @@

REVISION

-       Last updated: 09 October 2011
+       Last updated: 19 October 2011
        Copyright (c) 1997-2011 University of Cambridge.
 ------------------------------------------------------------------------------


@@ -6383,42 +6403,43 @@
        gle byte.


        5.  The  escape sequence \C can be used to match a single byte in UTF-8
-       mode, but its use can lead to some strange effects.  This  facility  is
-       not  available  in  the alternative matching function, pcre_dfa_exec(),
-       nor is it supported by the JIT  optimization  of  pcre_exec().  If  JIT
-       optimization  is  requested for a pattern that contains \C, it will not
-       succeed, and so the matching will be carried out by the  normal  inter-
-       pretive function.
+       mode, but its use can lead to some strange effects because it breaks up
+       multibyte characters (see the description of \C in the pcrepattern doc-
+       umentation). The use of \C is not supported in the alternative matching
+       function  pcre_dfa_exec(), nor is it supported in UTF-8 mode by the JIT
+       optimization of pcre_exec(). If JIT optimization  is  requested  for  a
+       UTF-8  pattern that contains \C, it will not succeed, and so the match-
+       ing will be carried out by the normal interpretive function.


-       6.  The  character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
+       6. The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly
        test characters of any code value, but, by default, the characters that
-       PCRE  recognizes  as digits, spaces, or word characters remain the same
-       set as before, all with values less than 256. This  remains  true  even
-       when  PCRE  is built to include Unicode property support, because to do
+       PCRE recognizes as digits, spaces, or word characters remain  the  same
+       set  as  before,  all with values less than 256. This remains true even
+       when PCRE is built to include Unicode property support, because  to  do
        otherwise would slow down PCRE in many common cases. Note in particular
        that this applies to \b and \B, because they are defined in terms of \w
-       and \W. If you really want to test for a wider sense of, say,  "digit",
-       you  can  use  explicit Unicode property tests such as \p{Nd}. Alterna-
-       tively, if you set the PCRE_UCP option,  the  way  that  the  character
-       escapes  work  is changed so that Unicode properties are used to deter-
-       mine which characters match. There are more details in the  section  on
+       and  \W. If you really want to test for a wider sense of, say, "digit",
+       you can use explicit Unicode property tests such  as  \p{Nd}.  Alterna-
+       tively,  if  you  set  the  PCRE_UCP option, the way that the character
+       escapes work is changed so that Unicode properties are used  to  deter-
+       mine  which  characters match. There are more details in the section on
        generic character types in the pcrepattern documentation.


-       7.  Similarly,  characters that match the POSIX named character classes
+       7. Similarly, characters that match the POSIX named  character  classes
        are all low-valued characters, unless the PCRE_UCP option is set.


-       8. However, the horizontal and  vertical  whitespace  matching  escapes
-       (\h,  \H,  \v, and \V) do match all the appropriate Unicode characters,
+       8.  However,  the  horizontal  and vertical whitespace matching escapes
+       (\h, \H, \v, and \V) do match all the appropriate  Unicode  characters,
        whether or not PCRE_UCP is set.


-       9. Case-insensitive matching applies only to  characters  whose  values
-       are  less than 128, unless PCRE is built with Unicode property support.
-       Even when Unicode property support is available, PCRE  still  uses  its
-       own  character  tables when checking the case of low-valued characters,
-       so as not to degrade performance.  The Unicode property information  is
+       9.  Case-insensitive  matching  applies only to characters whose values
+       are less than 128, unless PCRE is built with Unicode property  support.
+       Even  when  Unicode  property support is available, PCRE still uses its
+       own character tables when checking the case of  low-valued  characters,
+       so  as not to degrade performance.  The Unicode property information is
        used only for characters with higher values. Furthermore, PCRE supports
-       case-insensitive matching only  when  there  is  a  one-to-one  mapping
-       between  a letter's cases. There are a small number of many-to-one map-
+       case-insensitive  matching  only  when  there  is  a one-to-one mapping
+       between a letter's cases. There are a small number of many-to-one  map-
        pings in Unicode; these are not supported by PCRE.



@@ -6431,7 +6452,7 @@

REVISION

-       Last updated: 06 September 2011
+       Last updated: 19 October 2011
        Copyright (c) 1997-2011 University of Cambridge.
 ------------------------------------------------------------------------------


@@ -6534,7 +6555,7 @@

        The unsupported pattern items are:


-         \C            match a single byte, even in UTF-8 mode
+         \C            match a single byte; not supported in UTF-8 mode
          (?Cn)          callouts
          (?(<name>)...  conditional test on setting of a named subpattern
          (?(R)...       conditional test on whole pattern recursion
@@ -6691,7 +6712,7 @@


REVISION

-       Last updated: 05 October 2011
+       Last updated: 19 October 2011
        Copyright (c) 1997-2011 University of Cambridge.
 ------------------------------------------------------------------------------



Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2011-10-19 17:37:29 UTC (rev 737)
+++ code/trunk/doc/pcrepattern.3    2011-10-21 09:04:01 UTC (rev 738)
@@ -989,23 +989,23 @@
 the lookbehind.
 .P
 In general, the \eC escape sequence is best avoided in UTF-8 mode. However, one
-way of using it that avoids the problem of malformed UTF-8 characters is to 
-use a lookahead to check the length of the next character, as in this pattern 
+way of using it that avoids the problem of malformed UTF-8 characters is to
+use a lookahead to check the length of the next character, as in this pattern
 (ignore white space and line breaks):
 .sp
   (?| (?=[\ex00-\ex7f])(\eC) |
-      (?=[\ex80-\ex{7ff}])(\eC)(\eC) |  
-      (?=[\ex{800}-\ex{ffff}])(\eC)(\eC)(\eC) |  
+      (?=[\ex80-\ex{7ff}])(\eC)(\eC) |
+      (?=[\ex{800}-\ex{ffff}])(\eC)(\eC)(\eC) |
       (?=[\ex{10000}-\ex{1fffff}])(\eC)(\eC)(\eC)(\eC))
 .sp
-A group that starts with (?| resets the capturing parentheses numbers in each 
-alternative (see 
+A group that starts with (?| resets the capturing parentheses numbers in each
+alternative (see
 .\" HTML <a href="#dupsubpatternnumber">
 .\" </a>
 "Duplicate Subpattern Numbers"
 .\"
-below). The assertions at the start of each branch check the next UTF-8 
-character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The 
+below). The assertions at the start of each branch check the next UTF-8
+character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The
 character's individual bytes are then captured by the appropriate number of
 groups.
 .


Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2011-10-19 17:37:29 UTC (rev 737)
+++ code/trunk/pcretest.c    2011-10-21 09:04:01 UTC (rev 738)
@@ -2346,12 +2346,12 @@
           {
           unsigned char *pt = p;
           c = 0;
-          
+
           /* We used to have "while (isxdigit(*(++pt)))" here, but it fails
           when isxdigit() is a macro that refers to its argument more than
           once. This is banned by the C Standard, but apparently happens in at
           least one MacOS environment. */
-          
+
           for (pt++; isxdigit(*pt); pt++)
             c = c * 16 + tolower(*pt) - ((isdigit(*pt))? '0' : 'a' - 10);
           if (*pt == '}')