[Pcre-svn] [959] code/trunk/doc: Update documentation to cla…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [959] code/trunk/doc: Update documentation to clarify that UTF-8/ 16 checking is done on complete
Revision: 959
          http://vcs.pcre.org/viewvc?view=rev&revision=959
Author:   ph10
Date:     2012-04-14 17:16:58 +0100 (Sat, 14 Apr 2012)


Log Message:
-----------
Update documentation to clarify that UTF-8/16 checking is done on complete
strings before any other processing.

Modified Paths:
--------------
    code/trunk/doc/html/index.html
    code/trunk/doc/html/pcre16.html
    code/trunk/doc/html/pcreapi.html
    code/trunk/doc/html/pcrejit.html
    code/trunk/doc/html/pcrepattern.html
    code/trunk/doc/html/pcreunicode.html
    code/trunk/doc/index.html.src
    code/trunk/doc/pcre.txt
    code/trunk/doc/pcre16.3
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrejit.3
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcreunicode.3


Modified: code/trunk/doc/html/index.html
===================================================================
--- code/trunk/doc/html/index.html    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/html/index.html    2012-04-14 16:16:58 UTC (rev 959)
@@ -82,7 +82,7 @@
     <td>&nbsp;&nbsp;The <b>pcretest</b> command for testing PCRE</td></tr>


 <tr><td><a href="pcreunicode.html">pcreunicode</a></td>
-    <td>&nbsp;&nbsp;Discussion of Unicode and UTF-8 support</td></tr>
+    <td>&nbsp;&nbsp;Discussion of Unicode and UTF-8/UTF-16 support</td></tr>
 </table>


<p>

Modified: code/trunk/doc/html/pcre16.html
===================================================================
--- code/trunk/doc/html/pcre16.html    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/html/pcre16.html    2012-04-14 16:16:58 UTC (rev 959)
@@ -273,7 +273,12 @@
 <P>
 There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK,
 which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
-fact, these new options define the same bits in the options word.
+fact, these new options define the same bits in the options word. There is a 
+discussion about the
+<a href="pcreunicode.html#utf16strings">validity of UTF-16 strings</a>
+in the
+<a href="pcreunicode.html"><b>pcreunicode</b></a>
+page. 
 </P>
 <P>
 For the <b>pcre16_config()</b> function there is an option PCRE_CONFIG_UTF16
@@ -368,7 +373,7 @@
 </P>
 <br><a name="SEC22" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 08 January 2012
+Last updated: 14 April 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcreapi.html
===================================================================
--- code/trunk/doc/html/pcreapi.html    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/html/pcreapi.html    2012-04-14 16:16:58 UTC (rev 959)
@@ -1724,9 +1724,11 @@
 </pre>
 When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
 string is automatically checked when <b>pcre_exec()</b> is subsequently called.
-The value of <i>startoffset</i> is also checked to ensure that it points to the
-start of a UTF-8 character. There is a discussion about the validity of UTF-8
-strings in the
+The entire string is checked before any other processing takes place. The value
+of <i>startoffset</i> is also checked to ensure that it points to the start of a
+UTF-8 character. There is a discussion about the
+<a href="pcreunicode.html#utf8strings">validity of UTF-8 strings</a>
+in the
 <a href="pcreunicode.html"><b>pcreunicode</b></a>
 page. If an invalid sequence of bytes is found, <b>pcre_exec()</b> returns the
 error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a
@@ -2608,7 +2610,7 @@
 </P>
 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 24 February 2012
+Last updated: 14 April 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrejit.html
===================================================================
--- code/trunk/doc/html/pcrejit.html    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/html/pcrejit.html    2012-04-14 16:16:58 UTC (rev 959)
@@ -161,8 +161,8 @@
 <br><a name="SEC5" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
 <P>
 The only <b>pcre_exec()</b> options that are supported for JIT execution are
-PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,
-PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT.
+PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK, PCRE_NOTBOL, PCRE_NOTEOL,
+PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT.
 </P>
 <P>
 The unsupported pattern items are:
@@ -415,7 +415,7 @@
 </P>
 <br><a name="SEC13" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 February 2012
+Last updated: 14 April 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrepattern.html
===================================================================
--- code/trunk/doc/html/pcrepattern.html    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/html/pcrepattern.html    2012-04-14 16:16:58 UTC (rev 959)
@@ -1018,7 +1018,8 @@
 unit with \C in a UTF mode means that the rest of the string may start with a
 malformed UTF character. This has undefined results, because PCRE assumes that
 it is dealing with valid UTF strings (and by default it checks this at the
-start of processing unless the PCRE_NO_UTF8_CHECK option is used).
+start of processing unless the PCRE_NO_UTF8_CHECK or PCRE_NO_UTF16_CHECK option
+is used).
 </P>
 <P>
 PCRE does not allow \C to appear in lookbehind assertions
@@ -2867,7 +2868,7 @@
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 24 February 2012
+Last updated: 14 April 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcreunicode.html
===================================================================
--- code/trunk/doc/html/pcreunicode.html    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/html/pcreunicode.html    2012-04-14 16:16:58 UTC (rev 959)
@@ -74,11 +74,12 @@
 <P>
 When you set the PCRE_UTF8 flag, the byte strings passed as patterns and
 subjects are (by default) checked for validity on entry to the relevant
-functions. From release 7.3 of PCRE, the check is according the rules of RFC
-3629, which are themselves derived from the Unicode specification. Earlier
-releases of PCRE followed the rules of RFC 2279, which allows the full range of
-31-bit values (0 to 0x7FFFFFFF). The current check allows only values in the
-range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.
+functions. The entire string is checked before any other processing takes
+place. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
+which are themselves derived from the Unicode specification. Earlier releases
+of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
+values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
+to U+10FFFF, excluding U+D800 to U+DFFF.
 </P>
 <P>
 The excluded code points are the "Surrogate Area" of Unicode. They are reserved
@@ -96,10 +97,12 @@
 </P>
 <P>
 In some situations, you may already know that your strings are valid, and
-therefore want to skip these checks in order to improve performance. If you set
-the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes that
-the pattern or subject it is given (respectively) contains only valid UTF-8
-codes. In this case, it does not diagnose an invalid UTF-8 string.
+therefore want to skip these checks in order to improve performance, for
+example in the case of a long subject string that is being scanned repeatedly
+with different patterns. If you set the PCRE_NO_UTF8_CHECK flag at compile time
+or at run time, PCRE assumes that the pattern or subject it is given
+(respectively) contains only valid UTF-8 codes. In this case, it does not
+diagnose an invalid UTF-8 string.
 </P>
 <P>
 If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what
@@ -228,7 +231,7 @@
 REVISION
 </b><br>
 <P>
-Last updated: 13 January 2012
+Last updated: 14 April 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/index.html.src
===================================================================
--- code/trunk/doc/index.html.src    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/index.html.src    2012-04-14 16:16:58 UTC (rev 959)
@@ -82,7 +82,7 @@
     <td>&nbsp;&nbsp;The <b>pcretest</b> command for testing PCRE</td></tr>


 <tr><td><a href="pcreunicode.html">pcreunicode</a></td>
-    <td>&nbsp;&nbsp;Discussion of Unicode and UTF-8 support</td></tr>
+    <td>&nbsp;&nbsp;Discussion of Unicode and UTF-8/UTF-16 support</td></tr>
 </table>


<p>

Modified: code/trunk/doc/pcre.txt
===================================================================
--- code/trunk/doc/pcre.txt    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/pcre.txt    2012-04-14 16:16:58 UTC (rev 959)
@@ -367,47 +367,48 @@
        There   are   two   new   general   option   names,   PCRE_UTF16    and
        PCRE_NO_UTF16_CHECK,     which     correspond    to    PCRE_UTF8    and
        PCRE_NO_UTF8_CHECK in the 8-bit library. In  fact,  these  new  options
-       define the same bits in the options word.
+       define  the  same bits in the options word. There is a discussion about
+       the validity of UTF-16 strings in the pcreunicode page.


-       For  the  pcre16_config() function there is an option PCRE_CONFIG_UTF16
-       that returns 1 if UTF-16 support is configured, otherwise  0.  If  this
-       option  is given to pcre_config(), or if the PCRE_CONFIG_UTF8 option is
+       For the pcre16_config() function there is an  option  PCRE_CONFIG_UTF16
+       that  returns  1  if UTF-16 support is configured, otherwise 0. If this
+       option is given to pcre_config(), or if the PCRE_CONFIG_UTF8 option  is
        given to pcre16_config(), the result is the PCRE_ERROR_BADOPTION error.



CHARACTER CODES

-       In 16-bit mode, when  PCRE_UTF16  is  not  set,  character  values  are
+       In  16-bit  mode,  when  PCRE_UTF16  is  not  set, character values are
        treated in the same way as in 8-bit, non UTF-8 mode, except, of course,
-       that they can range from 0 to 0xffff instead of 0  to  0xff.  Character
-       types  for characters less than 0xff can therefore be influenced by the
-       locale in the same way as before.  Characters greater  than  0xff  have
+       that  they  can  range from 0 to 0xffff instead of 0 to 0xff. Character
+       types for characters less than 0xff can therefore be influenced by  the
+       locale  in  the  same way as before.  Characters greater than 0xff have
        only one case, and no "type" (such as letter or digit).


-       In  UTF-16  mode,  the  character  code  is  Unicode, in the range 0 to
-       0x10ffff, with the exception of values in the range  0xd800  to  0xdfff
-       because  those  are "surrogate" values that are used in pairs to encode
+       In UTF-16 mode, the character code  is  Unicode,  in  the  range  0  to
+       0x10ffff,  with  the  exception of values in the range 0xd800 to 0xdfff
+       because those are "surrogate" values that are used in pairs  to  encode
        values greater than 0xffff.


-       A UTF-16 string can indicate its endianness by special code knows as  a
+       A  UTF-16 string can indicate its endianness by special code knows as a
        byte-order mark (BOM). The PCRE functions do not handle this, expecting
-       strings  to  be  in  host  byte  order.  A  utility   function   called
-       pcre16_utf16_to_host_byte_order()  is  provided  to help with this (see
+       strings   to   be  in  host  byte  order.  A  utility  function  called
+       pcre16_utf16_to_host_byte_order() is provided to help  with  this  (see
        above).



ERROR NAMES

-       The errors PCRE_ERROR_BADUTF16_OFFSET and PCRE_ERROR_SHORTUTF16  corre-
-       spond  to  their  8-bit  counterparts.  The error PCRE_ERROR_BADMODE is
-       given when a compiled pattern is passed to a  function  that  processes
-       patterns  in  the  other  mode, for example, if a pattern compiled with
+       The  errors PCRE_ERROR_BADUTF16_OFFSET and PCRE_ERROR_SHORTUTF16 corre-
+       spond to their 8-bit  counterparts.  The  error  PCRE_ERROR_BADMODE  is
+       given  when  a  compiled pattern is passed to a function that processes
+       patterns in the other mode, for example, if  a  pattern  compiled  with
        pcre_compile() is passed to pcre16_exec().


-       There are new error codes whose names  begin  with  PCRE_UTF16_ERR  for
-       invalid  UTF-16  strings,  corresponding to the PCRE_UTF8_ERR codes for
-       UTF-8 strings that are described in the section entitled "Reason  codes
-       for  invalid UTF-8 strings" in the main pcreapi page. The UTF-16 errors
+       There  are  new  error  codes whose names begin with PCRE_UTF16_ERR for
+       invalid UTF-16 strings, corresponding to the  PCRE_UTF8_ERR  codes  for
+       UTF-8  strings that are described in the section entitled "Reason codes
+       for invalid UTF-8 strings" in the main pcreapi page. The UTF-16  errors
        are:


          PCRE_UTF16_ERR1  Missing low surrogate at end of string
@@ -418,36 +419,36 @@


ERROR TEXTS

-       If there is an error while compiling a pattern, the error text that  is
-       passed  back by pcre16_compile() or pcre16_compile2() is still an 8-bit
+       If  there is an error while compiling a pattern, the error text that is
+       passed back by pcre16_compile() or pcre16_compile2() is still an  8-bit
        character string, zero-terminated.



CALLOUTS

-       The subject and mark fields in the callout block that is  passed  to  a
+       The  subject  and  mark fields in the callout block that is passed to a
        callout function point to 16-bit vectors.



TESTING

-       The  pcretest  program continues to operate with 8-bit input and output
-       files, but it can be used for testing the 16-bit library. If it is  run
+       The pcretest program continues to operate with 8-bit input  and  output
+       files,  but it can be used for testing the 16-bit library. If it is run
        with the command line option -16, patterns and subject strings are con-
        verted from 8-bit to 16-bit before being passed to PCRE, and the 16-bit
-       library  functions  are used instead of the 8-bit ones. Returned 16-bit
+       library functions are used instead of the 8-bit ones.  Returned  16-bit
        strings are converted to 8-bit for output. If the 8-bit library was not
        compiled, pcretest defaults to 16-bit and the -16 option is ignored.


-       When  PCRE  is  being built, the RunTest script that is called by "make
-       check" uses the pcretest -C option to discover which of the  8-bit  and
+       When PCRE is being built, the RunTest script that is  called  by  "make
+       check"  uses  the pcretest -C option to discover which of the 8-bit and
        16-bit libraries has been built, and runs the tests appropriately.



NOT SUPPORTED IN 16-BIT MODE

        Not all the features of the 8-bit library are available with the 16-bit
-       library. The C++ and POSIX wrapper functions  support  only  the  8-bit
+       library.  The  C++  and  POSIX wrapper functions support only the 8-bit
        library, and the pcregrep program is at present 8-bit only.



@@ -460,7 +461,7 @@

REVISION

-       Last updated: 08 January 2012
+       Last updated: 14 April 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------


@@ -2656,200 +2657,201 @@

        When PCRE_UTF8 is set at compile time, the validity of the subject as a
        UTF-8  string is automatically checked when pcre_exec() is subsequently
-       called.  The value of startoffset is also checked  to  ensure  that  it
-       points  to  the start of a UTF-8 character. There is a discussion about
-       the validity of UTF-8 strings in the pcreunicode page.  If  an  invalid
-       sequence   of   bytes   is   found,   pcre_exec()   returns  the  error
+       called.  The entire string is checked before any other processing takes
+       place.  The  value  of  startoffset  is  also checked to ensure that it
+       points to the start of a UTF-8 character. There is a  discussion  about
+       the  validity  of  UTF-8 strings in the pcreunicode page. If an invalid
+       sequence  of  bytes   is   found,   pcre_exec()   returns   the   error
        PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a
        truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In
-       both cases, information about the precise nature of the error may  also
-       be  returned (see the descriptions of these errors in the section enti-
-       tled Error return values from pcre_exec() below).  If startoffset  con-
+       both  cases, information about the precise nature of the error may also
+       be returned (see the descriptions of these errors in the section  enti-
+       tled  Error return values from pcre_exec() below).  If startoffset con-
        tains a value that does not point to the start of a UTF-8 character (or
        to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is returned.


-       If you already know that your subject is valid, and you  want  to  skip
-       these    checks    for   performance   reasons,   you   can   set   the
-       PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might  want  to
-       do  this  for the second and subsequent calls to pcre_exec() if you are
-       making repeated calls to find all  the  matches  in  a  single  subject
-       string.  However,  you  should  be  sure  that the value of startoffset
-       points to the start of a character (or the end of  the  subject).  When
+       If  you  already  know that your subject is valid, and you want to skip
+       these   checks   for   performance   reasons,   you   can    set    the
+       PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
+       do this for the second and subsequent calls to pcre_exec() if  you  are
+       making  repeated  calls  to  find  all  the matches in a single subject
+       string. However, you should be  sure  that  the  value  of  startoffset
+       points  to  the  start of a character (or the end of the subject). When
        PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a
-       subject or an invalid value of startoffset is undefined.  Your  program
+       subject  or  an invalid value of startoffset is undefined. Your program
        may crash.


          PCRE_PARTIAL_HARD
          PCRE_PARTIAL_SOFT


-       These  options turn on the partial matching feature. For backwards com-
-       patibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A  partial
-       match  occurs if the end of the subject string is reached successfully,
-       but there are not enough subject characters to complete the  match.  If
+       These options turn on the partial matching feature. For backwards  com-
+       patibility,  PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial
+       match occurs if the end of the subject string is reached  successfully,
+       but  there  are not enough subject characters to complete the match. If
        this happens when PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set,
-       matching continues by testing any remaining alternatives.  Only  if  no
-       complete  match  can be found is PCRE_ERROR_PARTIAL returned instead of
-       PCRE_ERROR_NOMATCH. In other words,  PCRE_PARTIAL_SOFT  says  that  the
-       caller  is  prepared to handle a partial match, but only if no complete
+       matching  continues  by  testing any remaining alternatives. Only if no
+       complete match can be found is PCRE_ERROR_PARTIAL returned  instead  of
+       PCRE_ERROR_NOMATCH.  In  other  words,  PCRE_PARTIAL_SOFT says that the
+       caller is prepared to handle a partial match, but only if  no  complete
        match can be found.


-       If PCRE_PARTIAL_HARD is set, it overrides  PCRE_PARTIAL_SOFT.  In  this
-       case,  if  a  partial  match  is found, pcre_exec() immediately returns
-       PCRE_ERROR_PARTIAL, without  considering  any  other  alternatives.  In
-       other  words, when PCRE_PARTIAL_HARD is set, a partial match is consid-
+       If  PCRE_PARTIAL_HARD  is  set, it overrides PCRE_PARTIAL_SOFT. In this
+       case, if a partial match  is  found,  pcre_exec()  immediately  returns
+       PCRE_ERROR_PARTIAL,  without  considering  any  other  alternatives. In
+       other words, when PCRE_PARTIAL_HARD is set, a partial match is  consid-
        ered to be more important that an alternative complete match.


-       In both cases, the portion of the string that was  inspected  when  the
+       In  both  cases,  the portion of the string that was inspected when the
        partial match was found is set as the first matching string. There is a
-       more detailed discussion of partial and  multi-segment  matching,  with
+       more  detailed  discussion  of partial and multi-segment matching, with
        examples, in the pcrepartial documentation.


    The string to be matched by pcre_exec()


-       The  subject string is passed to pcre_exec() as a pointer in subject, a
-       length in bytes in length, and a starting byte offset  in  startoffset.
-       If  this  is  negative  or  greater  than  the  length  of the subject,
-       pcre_exec() returns PCRE_ERROR_BADOFFSET. When the starting  offset  is
-       zero,  the  search  for a match starts at the beginning of the subject,
+       The subject string is passed to pcre_exec() as a pointer in subject,  a
+       length  in  bytes in length, and a starting byte offset in startoffset.
+       If this is  negative  or  greater  than  the  length  of  the  subject,
+       pcre_exec()  returns  PCRE_ERROR_BADOFFSET. When the starting offset is
+       zero, the search for a match starts at the beginning  of  the  subject,
        and this is by far the most common case. In UTF-8 mode, the byte offset
-       must  point  to  the start of a UTF-8 character (or the end of the sub-
-       ject). Unlike the pattern string, the subject may contain  binary  zero
+       must point to the start of a UTF-8 character (or the end  of  the  sub-
+       ject).  Unlike  the pattern string, the subject may contain binary zero
        bytes.


-       A  non-zero  starting offset is useful when searching for another match
-       in the same subject by calling pcre_exec() again after a previous  suc-
-       cess.   Setting  startoffset differs from just passing over a shortened
-       string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins
+       A non-zero starting offset is useful when searching for  another  match
+       in  the same subject by calling pcre_exec() again after a previous suc-
+       cess.  Setting startoffset differs from just passing over  a  shortened
+       string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
        with any kind of lookbehind. For example, consider the pattern


          \Biss\B


-       which  finds  occurrences  of "iss" in the middle of words. (\B matches
-       only if the current position in the subject is not  a  word  boundary.)
-       When  applied  to the string "Mississipi" the first call to pcre_exec()
-       finds the first occurrence. If pcre_exec() is called  again  with  just
-       the  remainder  of  the  subject,  namely  "issipi", it does not match,
+       which finds occurrences of "iss" in the middle of  words.  (\B  matches
+       only  if  the  current position in the subject is not a word boundary.)
+       When applied to the string "Mississipi" the first call  to  pcre_exec()
+       finds  the  first  occurrence. If pcre_exec() is called again with just
+       the remainder of the subject,  namely  "issipi",  it  does  not  match,
        because \B is always false at the start of the subject, which is deemed
-       to  be  a  word  boundary. However, if pcre_exec() is passed the entire
+       to be a word boundary. However, if pcre_exec()  is  passed  the  entire
        string again, but with startoffset set to 4, it finds the second occur-
-       rence  of "iss" because it is able to look behind the starting point to
+       rence of "iss" because it is able to look behind the starting point  to
        discover that it is preceded by a letter.


-       Finding all the matches in a subject is tricky  when  the  pattern  can
+       Finding  all  the  matches  in a subject is tricky when the pattern can
        match an empty string. It is possible to emulate Perl's /g behaviour by
-       first  trying  the  match  again  at  the   same   offset,   with   the
-       PCRE_NOTEMPTY_ATSTART  and  PCRE_ANCHORED  options,  and  then  if that
-       fails, advancing the starting  offset  and  trying  an  ordinary  match
+       first   trying   the   match   again  at  the  same  offset,  with  the
+       PCRE_NOTEMPTY_ATSTART and  PCRE_ANCHORED  options,  and  then  if  that
+       fails,  advancing  the  starting  offset  and  trying an ordinary match
        again. There is some code that demonstrates how to do this in the pcre-
        demo sample program. In the most general case, you have to check to see
-       if  the newline convention recognizes CRLF as a newline, and if so, and
+       if the newline convention recognizes CRLF as a newline, and if so,  and
        the current character is CR followed by LF, advance the starting offset
        by two characters instead of one.


-       If  a  non-zero starting offset is passed when the pattern is anchored,
+       If a non-zero starting offset is passed when the pattern  is  anchored,
        one attempt to match at the given offset is made. This can only succeed
-       if  the  pattern  does  not require the match to be at the start of the
+       if the pattern does not require the match to be at  the  start  of  the
        subject.


    How pcre_exec() returns captured substrings


-       In general, a pattern matches a certain portion of the subject, and  in
-       addition,  further  substrings  from  the  subject may be picked out by
-       parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
-       this  is  called "capturing" in what follows, and the phrase "capturing
-       subpattern" is used for a fragment of a pattern that picks out  a  sub-
-       string.  PCRE  supports several other kinds of parenthesized subpattern
+       In  general, a pattern matches a certain portion of the subject, and in
+       addition, further substrings from the subject  may  be  picked  out  by
+       parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,
+       this is called "capturing" in what follows, and the  phrase  "capturing
+       subpattern"  is  used for a fragment of a pattern that picks out a sub-
+       string. PCRE supports several other kinds of  parenthesized  subpattern
        that do not cause substrings to be captured.


        Captured substrings are returned to the caller via a vector of integers
-       whose  address is passed in ovector. The number of elements in the vec-
-       tor is passed in ovecsize, which must be a non-negative  number.  Note:
+       whose address is passed in ovector. The number of elements in the  vec-
+       tor  is  passed in ovecsize, which must be a non-negative number. Note:
        this argument is NOT the size of ovector in bytes.


-       The  first  two-thirds of the vector is used to pass back captured sub-
-       strings, each substring using a pair of integers. The  remaining  third
-       of  the  vector is used as workspace by pcre_exec() while matching cap-
-       turing subpatterns, and is not available for passing back  information.
-       The  number passed in ovecsize should always be a multiple of three. If
+       The first two-thirds of the vector is used to pass back  captured  sub-
+       strings,  each  substring using a pair of integers. The remaining third
+       of the vector is used as workspace by pcre_exec() while  matching  cap-
+       turing  subpatterns, and is not available for passing back information.
+       The number passed in ovecsize should always be a multiple of three.  If
        it is not, it is rounded down.


-       When a match is successful, information about  captured  substrings  is
-       returned  in  pairs  of integers, starting at the beginning of ovector,
-       and continuing up to two-thirds of its length at the  most.  The  first
-       element  of  each pair is set to the byte offset of the first character
-       in a substring, and the second is set to the byte offset of  the  first
-       character  after  the end of a substring. Note: these values are always
+       When  a  match  is successful, information about captured substrings is
+       returned in pairs of integers, starting at the  beginning  of  ovector,
+       and  continuing  up  to two-thirds of its length at the most. The first
+       element of each pair is set to the byte offset of the  first  character
+       in  a  substring, and the second is set to the byte offset of the first
+       character after the end of a substring. Note: these values  are  always
        byte offsets, even in UTF-8 mode. They are not character counts.


-       The first pair of integers, ovector[0]  and  ovector[1],  identify  the
-       portion  of  the subject string matched by the entire pattern. The next
-       pair is used for the first capturing subpattern, and so on.  The  value
+       The  first  pair  of  integers, ovector[0] and ovector[1], identify the
+       portion of the subject string matched by the entire pattern.  The  next
+       pair  is  used for the first capturing subpattern, and so on. The value
        returned by pcre_exec() is one more than the highest numbered pair that
-       has been set.  For example, if two substrings have been  captured,  the
-       returned  value is 3. If there are no capturing subpatterns, the return
+       has  been  set.  For example, if two substrings have been captured, the
+       returned value is 3. If there are no capturing subpatterns, the  return
        value from a successful match is 1, indicating that just the first pair
        of offsets has been set.


        If a capturing subpattern is matched repeatedly, it is the last portion
        of the string that it matched that is returned.


-       If the vector is too small to hold all the captured substring  offsets,
+       If  the vector is too small to hold all the captured substring offsets,
        it is used as far as possible (up to two-thirds of its length), and the
-       function returns a value of zero. If neither the actual string  matched
-       nor  any captured substrings are of interest, pcre_exec() may be called
-       with ovector passed as NULL and ovecsize as zero. However, if the  pat-
-       tern  contains  back  references  and  the ovector is not big enough to
-       remember the related substrings, PCRE has to get additional memory  for
-       use  during matching. Thus it is usually advisable to supply an ovector
+       function  returns a value of zero. If neither the actual string matched
+       nor any captured substrings are of interest, pcre_exec() may be  called
+       with  ovector passed as NULL and ovecsize as zero. However, if the pat-
+       tern contains back references and the ovector  is  not  big  enough  to
+       remember  the related substrings, PCRE has to get additional memory for
+       use during matching. Thus it is usually advisable to supply an  ovector
        of reasonable size.


-       There are some cases where zero is returned  (indicating  vector  over-
-       flow)  when  in fact the vector is exactly the right size for the final
+       There  are  some  cases where zero is returned (indicating vector over-
+       flow) when in fact the vector is exactly the right size for  the  final
        match. For example, consider the pattern


          (a)(?:(b)c|bd)


-       If a vector of 6 elements (allowing for only 1 captured  substring)  is
+       If  a  vector of 6 elements (allowing for only 1 captured substring) is
        given with subject string "abd", pcre_exec() will try to set the second
        captured string, thereby recording a vector overflow, before failing to
-       match  "c"  and  backing  up  to  try  the second alternative. The zero
-       return, however, does correctly indicate that  the  maximum  number  of
+       match "c" and backing up  to  try  the  second  alternative.  The  zero
+       return,  however,  does  correctly  indicate that the maximum number of
        slots (namely 2) have been filled. In similar cases where there is tem-
-       porary overflow, but the final number of used slots  is  actually  less
+       porary  overflow,  but  the final number of used slots is actually less
        than the maximum, a non-zero value is returned.


        The pcre_fullinfo() function can be used to find out how many capturing
-       subpatterns there are in a compiled  pattern.  The  smallest  size  for
-       ovector  that  will allow for n captured substrings, in addition to the
+       subpatterns  there  are  in  a  compiled pattern. The smallest size for
+       ovector that will allow for n captured substrings, in addition  to  the
        offsets of the substring matched by the whole pattern, is (n+1)*3.


-       It is possible for capturing subpattern number n+1 to match  some  part
+       It  is  possible for capturing subpattern number n+1 to match some part
        of the subject when subpattern n has not been used at all. For example,
-       if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the
+       if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
        return from the function is 4, and subpatterns 1 and 3 are matched, but
-       2 is not. When this happens, both values in  the  offset  pairs  corre-
+       2  is  not.  When  this happens, both values in the offset pairs corre-
        sponding to unused subpatterns are set to -1.


-       Offset  values  that correspond to unused subpatterns at the end of the
-       expression are also set to -1. For example,  if  the  string  "abc"  is
-       matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not
-       matched. The return from the function is 2, because  the  highest  used
-       capturing  subpattern  number  is 1, and the offsets for for the second
-       and third capturing subpatterns (assuming the vector is  large  enough,
+       Offset values that correspond to unused subpatterns at the end  of  the
+       expression  are  also  set  to  -1. For example, if the string "abc" is
+       matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
+       matched.  The  return  from the function is 2, because the highest used
+       capturing subpattern number is 1, and the offsets for  for  the  second
+       and  third  capturing subpatterns (assuming the vector is large enough,
        of course) are set to -1.


-       Note:  Elements  in  the first two-thirds of ovector that do not corre-
-       spond to capturing parentheses in the pattern are never  changed.  That
-       is,  if  a pattern contains n capturing parentheses, no more than ovec-
-       tor[0] to ovector[2n+1] are set by pcre_exec(). The other elements  (in
+       Note: Elements in the first two-thirds of ovector that  do  not  corre-
+       spond  to  capturing parentheses in the pattern are never changed. That
+       is, if a pattern contains n capturing parentheses, no more  than  ovec-
+       tor[0]  to ovector[2n+1] are set by pcre_exec(). The other elements (in
        the first two-thirds) retain whatever values they previously had.


-       Some  convenience  functions  are  provided for extracting the captured
+       Some convenience functions are provided  for  extracting  the  captured
        substrings as separate strings. These are described below.


    Error return values from pcre_exec()


-       If pcre_exec() fails, it returns a negative number. The  following  are
+       If  pcre_exec()  fails, it returns a negative number. The following are
        defined in the header file:


          PCRE_ERROR_NOMATCH        (-1)
@@ -2858,7 +2860,7 @@


          PCRE_ERROR_NULL           (-2)


-       Either  code  or  subject  was  passed as NULL, or ovector was NULL and
+       Either code or subject was passed as NULL,  or  ovector  was  NULL  and
        ovecsize was not zero.


          PCRE_ERROR_BADOPTION      (-3)
@@ -2867,82 +2869,82 @@


          PCRE_ERROR_BADMAGIC       (-4)


-       PCRE stores a 4-byte "magic number" at the start of the compiled  code,
+       PCRE  stores a 4-byte "magic number" at the start of the compiled code,
        to catch the case when it is passed a junk pointer and to detect when a
        pattern that was compiled in an environment of one endianness is run in
-       an  environment  with the other endianness. This is the error that PCRE
+       an environment with the other endianness. This is the error  that  PCRE
        gives when the magic number is not present.


          PCRE_ERROR_UNKNOWN_OPCODE (-5)


        While running the pattern match, an unknown item was encountered in the
-       compiled  pattern.  This  error  could be caused by a bug in PCRE or by
+       compiled pattern. This error could be caused by a bug  in  PCRE  or  by
        overwriting of the compiled pattern.


          PCRE_ERROR_NOMEMORY       (-6)


-       If a pattern contains back references, but the ovector that  is  passed
+       If  a  pattern contains back references, but the ovector that is passed
        to pcre_exec() is not big enough to remember the referenced substrings,
-       PCRE gets a block of memory at the start of matching to  use  for  this
-       purpose.  If the call via pcre_malloc() fails, this error is given. The
+       PCRE  gets  a  block of memory at the start of matching to use for this
+       purpose. If the call via pcre_malloc() fails, this error is given.  The
        memory is automatically freed at the end of matching.


-       This error is also given if pcre_stack_malloc() fails  in  pcre_exec().
-       This  can happen only when PCRE has been compiled with --disable-stack-
+       This  error  is also given if pcre_stack_malloc() fails in pcre_exec().
+       This can happen only when PCRE has been compiled with  --disable-stack-
        for-recursion.


          PCRE_ERROR_NOSUBSTRING    (-7)


-       This error is used by the pcre_copy_substring(),  pcre_get_substring(),
+       This  error is used by the pcre_copy_substring(), pcre_get_substring(),
        and  pcre_get_substring_list()  functions  (see  below).  It  is  never
        returned by pcre_exec().


          PCRE_ERROR_MATCHLIMIT     (-8)


-       The backtracking limit, as specified by  the  match_limit  field  in  a
-       pcre_extra  structure  (or  defaulted) was reached. See the description
+       The  backtracking  limit,  as  specified  by the match_limit field in a
+       pcre_extra structure (or defaulted) was reached.  See  the  description
        above.


          PCRE_ERROR_CALLOUT        (-9)


        This error is never generated by pcre_exec() itself. It is provided for
-       use  by  callout functions that want to yield a distinctive error code.
+       use by callout functions that want to yield a distinctive  error  code.
        See the pcrecallout documentation for details.


          PCRE_ERROR_BADUTF8        (-10)


-       A string that contains an invalid UTF-8 byte sequence was passed  as  a
-       subject,  and the PCRE_NO_UTF8_CHECK option was not set. If the size of
-       the output vector (ovecsize) is at least 2,  the  byte  offset  to  the
-       start  of  the  the invalid UTF-8 character is placed in the first ele-
-       ment, and a reason code is placed in the  second  element.  The  reason
+       A  string  that contains an invalid UTF-8 byte sequence was passed as a
+       subject, and the PCRE_NO_UTF8_CHECK option was not set. If the size  of
+       the  output  vector  (ovecsize)  is  at least 2, the byte offset to the
+       start of the the invalid UTF-8 character is placed in  the  first  ele-
+       ment,  and  a  reason  code is placed in the second element. The reason
        codes are listed in the following section.  For backward compatibility,
-       if PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8  char-
-       acter   at   the   end   of   the   subject  (reason  codes  1  to  5),
+       if  PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8 char-
+       acter  at  the  end  of  the   subject   (reason   codes   1   to   5),
        PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8.


          PCRE_ERROR_BADUTF8_OFFSET (-11)


-       The UTF-8 byte sequence that was passed as a subject  was  checked  and
-       found  to be valid (the PCRE_NO_UTF8_CHECK option was not set), but the
-       value of startoffset did not point to the beginning of a UTF-8  charac-
+       The  UTF-8  byte  sequence that was passed as a subject was checked and
+       found to be valid (the PCRE_NO_UTF8_CHECK option was not set), but  the
+       value  of startoffset did not point to the beginning of a UTF-8 charac-
        ter or the end of the subject.


          PCRE_ERROR_PARTIAL        (-12)


-       The  subject  string did not match, but it did match partially. See the
+       The subject string did not match, but it did match partially.  See  the
        pcrepartial documentation for details of partial matching.


          PCRE_ERROR_BADPARTIAL     (-13)


-       This code is no longer in  use.  It  was  formerly  returned  when  the
-       PCRE_PARTIAL  option  was used with a compiled pattern containing items
-       that were  not  supported  for  partial  matching.  From  release  8.00
+       This  code  is  no  longer  in  use.  It was formerly returned when the
+       PCRE_PARTIAL option was used with a compiled pattern  containing  items
+       that  were  not  supported  for  partial  matching.  From  release 8.00
        onwards, there are no restrictions on partial matching.


          PCRE_ERROR_INTERNAL       (-14)


-       An  unexpected  internal error has occurred. This error could be caused
+       An unexpected internal error has occurred. This error could  be  caused
        by a bug in PCRE or by overwriting of the compiled pattern.


          PCRE_ERROR_BADCOUNT       (-15)
@@ -2952,7 +2954,7 @@
          PCRE_ERROR_RECURSIONLIMIT (-21)


        The internal recursion limit, as specified by the match_limit_recursion
-       field  in  a  pcre_extra  structure (or defaulted) was reached. See the
+       field in a pcre_extra structure (or defaulted)  was  reached.  See  the
        description above.


          PCRE_ERROR_BADNEWLINE     (-23)
@@ -2966,29 +2968,29 @@


          PCRE_ERROR_SHORTUTF8      (-25)


-       This  error  is returned instead of PCRE_ERROR_BADUTF8 when the subject
-       string ends with a truncated UTF-8 character and the  PCRE_PARTIAL_HARD
-       option  is  set.   Information  about  the  failure  is returned as for
-       PCRE_ERROR_BADUTF8. It is in fact sufficient to detect this  case,  but
-       this  special error code for PCRE_PARTIAL_HARD precedes the implementa-
-       tion of returned information; it is retained for backwards  compatibil-
+       This error is returned instead of PCRE_ERROR_BADUTF8 when  the  subject
+       string  ends with a truncated UTF-8 character and the PCRE_PARTIAL_HARD
+       option is set.  Information  about  the  failure  is  returned  as  for
+       PCRE_ERROR_BADUTF8.  It  is in fact sufficient to detect this case, but
+       this special error code for PCRE_PARTIAL_HARD precedes the  implementa-
+       tion  of returned information; it is retained for backwards compatibil-
        ity.


          PCRE_ERROR_RECURSELOOP    (-26)


        This error is returned when pcre_exec() detects a recursion loop within
-       the pattern. Specifically, it means that either the whole pattern or  a
-       subpattern  has been called recursively for the second time at the same
+       the  pattern. Specifically, it means that either the whole pattern or a
+       subpattern has been called recursively for the second time at the  same
        position in the subject string. Some simple patterns that might do this
-       are  detected  and faulted at compile time, but more complicated cases,
+       are detected and faulted at compile time, but more  complicated  cases,
        in particular mutual recursions between two different subpatterns, can-
        not be detected until run time.


          PCRE_ERROR_JIT_STACKLIMIT (-27)


-       This  error  is  returned  when a pattern that was successfully studied
-       using a JIT compile option is being matched, but the  memory  available
-       for  the  just-in-time  processing  stack  is not large enough. See the
+       This error is returned when a pattern  that  was  successfully  studied
+       using  a  JIT compile option is being matched, but the memory available
+       for the just-in-time processing stack is  not  large  enough.  See  the
        pcrejit documentation for more details.


          PCRE_ERROR_BADMODE (-28)
@@ -2998,8 +3000,8 @@


          PCRE_ERROR_BADENDIANNESS (-29)


-       This  error  is  given  if  a  pattern  that  was compiled and saved is
-       reloaded on a host with  different  endianness.  The  utility  function
+       This error is given if  a  pattern  that  was  compiled  and  saved  is
+       reloaded  on  a  host  with  different endianness. The utility function
        pcre_pattern_to_host_byte_order() can be used to convert such a pattern
        so that it runs on the new host.


@@ -3007,14 +3009,14 @@

    Reason codes for invalid UTF-8 strings


-       This section applies only  to  the  8-bit  library.  The  corresponding
+       This  section  applies  only  to  the  8-bit library. The corresponding
        information for the 16-bit library is given in the pcre16 page.


        When pcre_exec() returns either PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORT-
-       UTF8, and the size of the output vector (ovecsize) is at least  2,  the
-       offset  of  the  start  of the invalid UTF-8 character is placed in the
+       UTF8,  and  the size of the output vector (ovecsize) is at least 2, the
+       offset of the start of the invalid UTF-8 character  is  placed  in  the
        first output vector element (ovector[0]) and a reason code is placed in
-       the  second  element  (ovector[1]). The reason codes are given names in
+       the second element (ovector[1]). The reason codes are  given  names  in
        the pcre.h header file:


          PCRE_UTF8_ERR1
@@ -3023,10 +3025,10 @@
          PCRE_UTF8_ERR4
          PCRE_UTF8_ERR5


-       The string ends with a truncated UTF-8 character;  the  code  specifies
-       how  many bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8
-       characters to be no longer than 4 bytes, the  encoding  scheme  (origi-
-       nally  defined  by  RFC  2279)  allows  for  up to 6 bytes, and this is
+       The  string  ends  with a truncated UTF-8 character; the code specifies
+       how many bytes are missing (1 to 5). Although RFC 3629 restricts  UTF-8
+       characters  to  be  no longer than 4 bytes, the encoding scheme (origi-
+       nally defined by RFC 2279) allows for  up  to  6  bytes,  and  this  is
        checked first; hence the possibility of 4 or 5 missing bytes.


          PCRE_UTF8_ERR6
@@ -3036,24 +3038,24 @@
          PCRE_UTF8_ERR10


        The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of
-       the  character  do  not have the binary value 0b10 (that is, either the
+       the character do not have the binary value 0b10 (that  is,  either  the
        most significant bit is 0, or the next bit is 1).


          PCRE_UTF8_ERR11
          PCRE_UTF8_ERR12


-       A character that is valid by the RFC 2279 rules is either 5 or 6  bytes
+       A  character that is valid by the RFC 2279 rules is either 5 or 6 bytes
        long; these code points are excluded by RFC 3629.


          PCRE_UTF8_ERR13


-       A  4-byte character has a value greater than 0x10fff; these code points
+       A 4-byte character has a value greater than 0x10fff; these code  points
        are excluded by RFC 3629.


          PCRE_UTF8_ERR14


-       A 3-byte character has a value in the  range  0xd800  to  0xdfff;  this
-       range  of code points are reserved by RFC 3629 for use with UTF-16, and
+       A  3-byte  character  has  a  value in the range 0xd800 to 0xdfff; this
+       range of code points are reserved by RFC 3629 for use with UTF-16,  and
        so are excluded from UTF-8.


          PCRE_UTF8_ERR15
@@ -3062,21 +3064,21 @@
          PCRE_UTF8_ERR18
          PCRE_UTF8_ERR19


-       A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it  codes
-       for  a  value that can be represented by fewer bytes, which is invalid.
-       For example, the two bytes 0xc0, 0xae give the value 0x2e,  whose  cor-
+       A  2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes
+       for a value that can be represented by fewer bytes, which  is  invalid.
+       For  example,  the two bytes 0xc0, 0xae give the value 0x2e, whose cor-
        rect coding uses just one byte.


          PCRE_UTF8_ERR20


        The two most significant bits of the first byte of a character have the
-       binary value 0b10 (that is, the most significant bit is 1 and the  sec-
-       ond  is  0). Such a byte can only validly occur as the second or subse-
+       binary  value 0b10 (that is, the most significant bit is 1 and the sec-
+       ond is 0). Such a byte can only validly occur as the second  or  subse-
        quent byte of a multi-byte character.


          PCRE_UTF8_ERR21


-       The first byte of a character has the value 0xfe or 0xff. These  values
+       The  first byte of a character has the value 0xfe or 0xff. These values
        can never occur in a valid UTF-8 string.



@@ -3093,78 +3095,78 @@
        int pcre_get_substring_list(const char *subject,
             int *ovector, int stringcount, const char ***listptr);


-       Captured  substrings  can  be  accessed  directly  by using the offsets
-       returned by pcre_exec() in  ovector.  For  convenience,  the  functions
+       Captured substrings can be  accessed  directly  by  using  the  offsets
+       returned  by  pcre_exec()  in  ovector.  For convenience, the functions
        pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
-       string_list() are provided for extracting captured substrings  as  new,
-       separate,  zero-terminated strings. These functions identify substrings
-       by number. The next section describes functions  for  extracting  named
+       string_list()  are  provided for extracting captured substrings as new,
+       separate, zero-terminated strings. These functions identify  substrings
+       by  number.  The  next section describes functions for extracting named
        substrings.


-       A  substring that contains a binary zero is correctly extracted and has
-       a further zero added on the end, but the result is not, of course, a  C
-       string.   However,  you  can  process such a string by referring to the
-       length that is  returned  by  pcre_copy_substring()  and  pcre_get_sub-
+       A substring that contains a binary zero is correctly extracted and  has
+       a  further zero added on the end, but the result is not, of course, a C
+       string.  However, you can process such a string  by  referring  to  the
+       length  that  is  returned  by  pcre_copy_substring() and pcre_get_sub-
        string().  Unfortunately, the interface to pcre_get_substring_list() is
-       not adequate for handling strings containing binary zeros, because  the
+       not  adequate for handling strings containing binary zeros, because the
        end of the final string is not independently indicated.


-       The  first  three  arguments  are the same for all three of these func-
-       tions: subject is the subject string that has  just  been  successfully
+       The first three arguments are the same for all  three  of  these  func-
+       tions:  subject  is  the subject string that has just been successfully
        matched, ovector is a pointer to the vector of integer offsets that was
        passed to pcre_exec(), and stringcount is the number of substrings that
-       were  captured  by  the match, including the substring that matched the
+       were captured by the match, including the substring  that  matched  the
        entire regular expression. This is the value returned by pcre_exec() if
-       it  is greater than zero. If pcre_exec() returned zero, indicating that
-       it ran out of space in ovector, the value passed as stringcount  should
+       it is greater than zero. If pcre_exec() returned zero, indicating  that
+       it  ran out of space in ovector, the value passed as stringcount should
        be the number of elements in the vector divided by three.


-       The  functions pcre_copy_substring() and pcre_get_substring() extract a
-       single substring, whose number is given as  stringnumber.  A  value  of
-       zero  extracts  the  substring that matched the entire pattern, whereas
-       higher values  extract  the  captured  substrings.  For  pcre_copy_sub-
-       string(),  the  string  is  placed  in buffer, whose length is given by
-       buffersize, while for pcre_get_substring() a new  block  of  memory  is
-       obtained  via  pcre_malloc,  and its address is returned via stringptr.
-       The yield of the function is the length of the  string,  not  including
+       The functions pcre_copy_substring() and pcre_get_substring() extract  a
+       single  substring,  whose  number  is given as stringnumber. A value of
+       zero extracts the substring that matched the  entire  pattern,  whereas
+       higher  values  extract  the  captured  substrings.  For pcre_copy_sub-
+       string(), the string is placed in buffer,  whose  length  is  given  by
+       buffersize,  while  for  pcre_get_substring()  a new block of memory is
+       obtained via pcre_malloc, and its address is  returned  via  stringptr.
+       The  yield  of  the function is the length of the string, not including
        the terminating zero, or one of these error codes:


          PCRE_ERROR_NOMEMORY       (-6)


-       The  buffer  was too small for pcre_copy_substring(), or the attempt to
+       The buffer was too small for pcre_copy_substring(), or the  attempt  to
        get memory failed for pcre_get_substring().


          PCRE_ERROR_NOSUBSTRING    (-7)


        There is no substring whose number is stringnumber.


-       The pcre_get_substring_list()  function  extracts  all  available  sub-
-       strings  and  builds  a list of pointers to them. All this is done in a
+       The  pcre_get_substring_list()  function  extracts  all  available sub-
+       strings and builds a list of pointers to them. All this is  done  in  a
        single block of memory that is obtained via pcre_malloc. The address of
-       the  memory  block  is returned via listptr, which is also the start of
-       the list of string pointers. The end of the list is marked  by  a  NULL
-       pointer.  The  yield  of  the function is zero if all went well, or the
+       the memory block is returned via listptr, which is also  the  start  of
+       the  list  of  string pointers. The end of the list is marked by a NULL
+       pointer. The yield of the function is zero if all  went  well,  or  the
        error code


          PCRE_ERROR_NOMEMORY       (-6)


        if the attempt to get the memory block failed.


-       When any of these functions encounter a substring that is unset,  which
-       can  happen  when  capturing subpattern number n+1 matches some part of
-       the subject, but subpattern n has not been used at all, they return  an
+       When  any of these functions encounter a substring that is unset, which
+       can happen when capturing subpattern number n+1 matches  some  part  of
+       the  subject, but subpattern n has not been used at all, they return an
        empty string. This can be distinguished from a genuine zero-length sub-
-       string by inspecting the appropriate offset in ovector, which is  nega-
+       string  by inspecting the appropriate offset in ovector, which is nega-
        tive for unset substrings.


-       The  two convenience functions pcre_free_substring() and pcre_free_sub-
-       string_list() can be used to free the memory  returned  by  a  previous
+       The two convenience functions pcre_free_substring() and  pcre_free_sub-
+       string_list()  can  be  used  to free the memory returned by a previous
        call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
-       tively. They do nothing more than  call  the  function  pointed  to  by
-       pcre_free,  which  of course could be called directly from a C program.
-       However, PCRE is used in some situations where it is linked via a  spe-
-       cial   interface  to  another  programming  language  that  cannot  use
-       pcre_free directly; it is for these cases that the functions  are  pro-
+       tively.  They  do  nothing  more  than  call the function pointed to by
+       pcre_free, which of course could be called directly from a  C  program.
+       However,  PCRE is used in some situations where it is linked via a spe-
+       cial  interface  to  another  programming  language  that  cannot   use
+       pcre_free  directly;  it is for these cases that the functions are pro-
        vided.



@@ -3183,7 +3185,7 @@
             int stringcount, const char *stringname,
             const char **stringptr);


-       To  extract a substring by name, you first have to find associated num-
+       To extract a substring by name, you first have to find associated  num-
        ber.  For example, for this pattern


          (a+)b(?<xxx>\d+)...
@@ -3192,35 +3194,35 @@
        be unique (PCRE_DUPNAMES was not set), you can find the number from the
        name by calling pcre_get_stringnumber(). The first argument is the com-
        piled pattern, and the second is the name. The yield of the function is
-       the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if  there  is  no
+       the  subpattern  number,  or PCRE_ERROR_NOSUBSTRING (-7) if there is no
        subpattern of that name.


        Given the number, you can extract the substring directly, or use one of
        the functions described in the previous section. For convenience, there
        are also two functions that do the whole job.


-       Most    of    the    arguments   of   pcre_copy_named_substring()   and
-       pcre_get_named_substring() are the same  as  those  for  the  similarly
-       named  functions  that extract by number. As these are described in the
-       previous section, they are not re-described here. There  are  just  two
+       Most   of   the   arguments    of    pcre_copy_named_substring()    and
+       pcre_get_named_substring()  are  the  same  as  those for the similarly
+       named functions that extract by number. As these are described  in  the
+       previous  section,  they  are not re-described here. There are just two
        differences:


-       First,  instead  of a substring number, a substring name is given. Sec-
+       First, instead of a substring number, a substring name is  given.  Sec-
        ond, there is an extra argument, given at the start, which is a pointer
-       to  the compiled pattern. This is needed in order to gain access to the
+       to the compiled pattern. This is needed in order to gain access to  the
        name-to-number translation table.


-       These functions call pcre_get_stringnumber(), and if it succeeds,  they
-       then  call  pcre_copy_substring() or pcre_get_substring(), as appropri-
-       ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate  names,  the
+       These  functions call pcre_get_stringnumber(), and if it succeeds, they
+       then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-
+       ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the
        behaviour may not be what you want (see the next section).


        Warning: If the pattern uses the (?| feature to set up multiple subpat-
-       terns with the same number, as described in the  section  on  duplicate
-       subpattern  numbers  in  the  pcrepattern page, you cannot use names to
-       distinguish the different subpatterns, because names are  not  included
-       in  the compiled code. The matching process uses only numbers. For this
-       reason, the use of different names for subpatterns of the  same  number
+       terns  with  the  same number, as described in the section on duplicate
+       subpattern numbers in the pcrepattern page, you  cannot  use  names  to
+       distinguish  the  different subpatterns, because names are not included
+       in the compiled code. The matching process uses only numbers. For  this
+       reason,  the  use of different names for subpatterns of the same number
        causes an error at compile time.



@@ -3229,76 +3231,76 @@
        int pcre_get_stringtable_entries(const pcre *code,
             const char *name, char **first, char **last);


-       When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for
-       subpatterns are not required to be unique. (Duplicate names are  always
-       allowed  for subpatterns with the same number, created by using the (?|
-       feature. Indeed, if such subpatterns are named, they  are  required  to
+       When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for
+       subpatterns  are not required to be unique. (Duplicate names are always
+       allowed for subpatterns with the same number, created by using the  (?|
+       feature.  Indeed,  if  such subpatterns are named, they are required to
        use the same names.)


        Normally, patterns with duplicate names are such that in any one match,
-       only one of the named subpatterns participates. An example is shown  in
+       only  one of the named subpatterns participates. An example is shown in
        the pcrepattern documentation.


-       When    duplicates   are   present,   pcre_copy_named_substring()   and
-       pcre_get_named_substring() return the first substring corresponding  to
-       the  given  name  that  is set. If none are set, PCRE_ERROR_NOSUBSTRING
-       (-7) is returned; no  data  is  returned.  The  pcre_get_stringnumber()
-       function  returns one of the numbers that are associated with the name,
+       When   duplicates   are   present,   pcre_copy_named_substring()    and
+       pcre_get_named_substring()  return the first substring corresponding to
+       the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING
+       (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()
+       function returns one of the numbers that are associated with the  name,
        but it is not defined which it is.


-       If you want to get full details of all captured substrings for a  given
-       name,  you  must  use  the pcre_get_stringtable_entries() function. The
+       If  you want to get full details of all captured substrings for a given
+       name, you must use  the  pcre_get_stringtable_entries()  function.  The
        first argument is the compiled pattern, and the second is the name. The
-       third  and  fourth  are  pointers to variables which are updated by the
+       third and fourth are pointers to variables which  are  updated  by  the
        function. After it has run, they point to the first and last entries in
-       the  name-to-number  table  for  the  given  name.  The function itself
-       returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if
-       there  are none. The format of the table is described above in the sec-
-       tion entitled Information about a pattern above.  Given all  the  rele-
-       vant  entries  for the name, you can extract each of their numbers, and
+       the name-to-number table  for  the  given  name.  The  function  itself
+       returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
+       there are none. The format of the table is described above in the  sec-
+       tion  entitled  Information about a pattern above.  Given all the rele-
+       vant entries for the name, you can extract each of their  numbers,  and
        hence the captured data, if any.



FINDING ALL POSSIBLE MATCHES

-       The traditional matching function uses a  similar  algorithm  to  Perl,
+       The  traditional  matching  function  uses a similar algorithm to Perl,
        which stops when it finds the first match, starting at a given point in
-       the subject. If you want to find all possible matches, or  the  longest
-       possible  match,  consider using the alternative matching function (see
-       below) instead. If you cannot use the alternative function,  but  still
-       need  to  find all possible matches, you can kludge it up by making use
+       the  subject.  If you want to find all possible matches, or the longest
+       possible match, consider using the alternative matching  function  (see
+       below)  instead.  If you cannot use the alternative function, but still
+       need to find all possible matches, you can kludge it up by  making  use
        of the callout facility, which is described in the pcrecallout documen-
        tation.


        What you have to do is to insert a callout right at the end of the pat-
-       tern.  When your callout function is called, extract and save the  cur-
-       rent  matched  substring.  Then  return  1, which forces pcre_exec() to
-       backtrack and try other alternatives. Ultimately, when it runs  out  of
+       tern.   When your callout function is called, extract and save the cur-
+       rent matched substring. Then return  1,  which  forces  pcre_exec()  to
+       backtrack  and  try other alternatives. Ultimately, when it runs out of
        matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.



OBTAINING AN ESTIMATE OF STACK USAGE

-       Matching  certain  patterns  using pcre_exec() can use a lot of process
-       stack, which in certain environments can be  rather  limited  in  size.
-       Some  users  find it helpful to have an estimate of the amount of stack
-       that is used by pcre_exec(), to help  them  set  recursion  limits,  as
-       described  in  the pcrestack documentation. The estimate that is output
+       Matching certain patterns using pcre_exec() can use a  lot  of  process
+       stack,  which  in  certain  environments can be rather limited in size.
+       Some users find it helpful to have an estimate of the amount  of  stack
+       that  is  used  by  pcre_exec(),  to help them set recursion limits, as
+       described in the pcrestack documentation. The estimate that  is  output
        by pcretest when called with the -m and -C options is obtained by call-
-       ing  pcre_exec with the values NULL, NULL, NULL, -999, and -999 for its
+       ing pcre_exec with the values NULL, NULL, NULL, -999, and -999 for  its
        first five arguments.


-       Normally, if  its  first  argument  is  NULL,  pcre_exec()  immediately
-       returns  the negative error code PCRE_ERROR_NULL, but with this special
-       combination of arguments, it returns instead a  negative  number  whose
-       absolute  value  is the approximate stack frame size in bytes. (A nega-
-       tive number is used so that it is clear that no  match  has  happened.)
-       The  value  is  approximate  because  in some cases, recursive calls to
+       Normally,  if  its  first  argument  is  NULL,  pcre_exec() immediately
+       returns the negative error code PCRE_ERROR_NULL, but with this  special
+       combination  of  arguments,  it returns instead a negative number whose
+       absolute value is the approximate stack frame size in bytes.  (A  nega-
+       tive  number  is  used so that it is clear that no match has happened.)
+       The value is approximate because in  some  cases,  recursive  calls  to
        pcre_exec() occur when there are one or two additional variables on the
        stack.


-       If  PCRE  has  been  compiled  to use the heap instead of the stack for
-       recursion, the value returned  is  the  size  of  each  block  that  is
+       If PCRE has been compiled to use the heap  instead  of  the  stack  for
+       recursion,  the  value  returned  is  the  size  of  each block that is
        obtained from the heap.



@@ -3309,26 +3311,26 @@
             int options, int *ovector, int ovecsize,
             int *workspace, int wscount);


-       The  function  pcre_dfa_exec()  is  called  to  match  a subject string
-       against a compiled pattern, using a matching algorithm that  scans  the
-       subject  string  just  once, and does not backtrack. This has different
-       characteristics to the normal algorithm, and  is  not  compatible  with
-       Perl.  Some  of the features of PCRE patterns are not supported. Never-
-       theless, there are times when this kind of matching can be useful.  For
-       a  discussion  of  the  two matching algorithms, and a list of features
-       that pcre_dfa_exec() does not support, see the pcrematching  documenta-
+       The function pcre_dfa_exec()  is  called  to  match  a  subject  string
+       against  a  compiled pattern, using a matching algorithm that scans the
+       subject string just once, and does not backtrack.  This  has  different
+       characteristics  to  the  normal  algorithm, and is not compatible with
+       Perl. Some of the features of PCRE patterns are not  supported.  Never-
+       theless,  there are times when this kind of matching can be useful. For
+       a discussion of the two matching algorithms, and  a  list  of  features
+       that  pcre_dfa_exec() does not support, see the pcrematching documenta-
        tion.


-       The  arguments  for  the  pcre_dfa_exec()  function are the same as for
+       The arguments for the pcre_dfa_exec() function  are  the  same  as  for
        pcre_exec(), plus two extras. The ovector argument is used in a differ-
-       ent  way,  and  this is described below. The other common arguments are
-       used in the same way as for pcre_exec(), so their  description  is  not
+       ent way, and this is described below. The other  common  arguments  are
+       used  in  the  same way as for pcre_exec(), so their description is not
        repeated here.


-       The  two  additional  arguments provide workspace for the function. The
-       workspace vector should contain at least 20 elements. It  is  used  for
+       The two additional arguments provide workspace for  the  function.  The
+       workspace  vector  should  contain at least 20 elements. It is used for
        keeping  track  of  multiple  paths  through  the  pattern  tree.  More
-       workspace will be needed for patterns and subjects where  there  are  a
+       workspace  will  be  needed for patterns and subjects where there are a
        lot of potential matches.


        Here is an example of a simple call to pcre_dfa_exec():
@@ -3350,55 +3352,55 @@


    Option bits for pcre_dfa_exec()


-       The  unused  bits  of  the options argument for pcre_dfa_exec() must be
-       zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-
+       The unused bits of the options argument  for  pcre_dfa_exec()  must  be
+       zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
        LINE_xxx,        PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY,
-       PCRE_NOTEMPTY_ATSTART,      PCRE_NO_UTF8_CHECK,       PCRE_BSR_ANYCRLF,
-       PCRE_BSR_UNICODE,  PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD, PCRE_PAR-
-       TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART.  All but  the  last
-       four  of  these  are  exactly  the  same  as  for pcre_exec(), so their
+       PCRE_NOTEMPTY_ATSTART,       PCRE_NO_UTF8_CHECK,      PCRE_BSR_ANYCRLF,
+       PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD,  PCRE_PAR-
+       TIAL_SOFT,  PCRE_DFA_SHORTEST,  and PCRE_DFA_RESTART.  All but the last
+       four of these are  exactly  the  same  as  for  pcre_exec(),  so  their
        description is not repeated here.


          PCRE_PARTIAL_HARD
          PCRE_PARTIAL_SOFT


-       These have the same general effect as they do for pcre_exec(), but  the
-       details  are  slightly  different.  When  PCRE_PARTIAL_HARD  is set for
-       pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of  the  sub-
-       ject  is  reached  and there is still at least one matching possibility
+       These  have the same general effect as they do for pcre_exec(), but the
+       details are slightly  different.  When  PCRE_PARTIAL_HARD  is  set  for
+       pcre_dfa_exec(),  it  returns PCRE_ERROR_PARTIAL if the end of the sub-
+       ject is reached and there is still at least  one  matching  possibility
        that requires additional characters. This happens even if some complete
        matches have also been found. When PCRE_PARTIAL_SOFT is set, the return
        code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end
-       of  the  subject  is  reached, there have been no complete matches, but
-       there is still at least one matching possibility. The  portion  of  the
-       string  that  was inspected when the longest partial match was found is
-       set as the first matching string  in  both  cases.   There  is  a  more
-       detailed  discussion  of partial and multi-segment matching, with exam-
+       of the subject is reached, there have been  no  complete  matches,  but
+       there  is  still  at least one matching possibility. The portion of the
+       string that was inspected when the longest partial match was  found  is
+       set  as  the  first  matching  string  in  both cases.  There is a more
+       detailed discussion of partial and multi-segment matching,  with  exam-
        ples, in the pcrepartial documentation.


          PCRE_DFA_SHORTEST


-       Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to
+       Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to
        stop as soon as it has found one match. Because of the way the alterna-
-       tive algorithm works, this is necessarily the shortest  possible  match
+       tive  algorithm  works, this is necessarily the shortest possible match
        at the first possible matching point in the subject string.


          PCRE_DFA_RESTART


        When pcre_dfa_exec() returns a partial match, it is possible to call it
-       again, with additional subject characters, and have  it  continue  with
-       the  same match. The PCRE_DFA_RESTART option requests this action; when
-       it is set, the workspace and wscount options must  reference  the  same
-       vector  as  before  because data about the match so far is left in them
+       again,  with  additional  subject characters, and have it continue with
+       the same match. The PCRE_DFA_RESTART option requests this action;  when
+       it  is  set,  the workspace and wscount options must reference the same
+       vector as before because data about the match so far is  left  in  them
        after a partial match. There is more discussion of this facility in the
        pcrepartial documentation.


    Successful returns from pcre_dfa_exec()


-       When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
+       When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-
        string in the subject. Note, however, that all the matches from one run
-       of  the  function  start  at the same point in the subject. The shorter
-       matches are all initial substrings of the longer matches. For  example,
+       of the function start at the same point in  the  subject.  The  shorter
+       matches  are all initial substrings of the longer matches. For example,
        if the pattern


          <.*>
@@ -3413,63 +3415,63 @@
          <something> <something else>
          <something> <something else> <something further>


-       On  success,  the  yield of the function is a number greater than zero,
-       which is the number of matched substrings.  The  substrings  themselves
-       are  returned  in  ovector. Each string uses two elements; the first is
-       the offset to the start, and the second is the offset to  the  end.  In
-       fact,  all  the  strings  have the same start offset. (Space could have
-       been saved by giving this only once, but it was decided to retain  some
-       compatibility  with  the  way pcre_exec() returns data, even though the
+       On success, the yield of the function is a number  greater  than  zero,
+       which  is  the  number of matched substrings. The substrings themselves
+       are returned in ovector. Each string uses two elements;  the  first  is
+       the  offset  to  the start, and the second is the offset to the end. In
+       fact, all the strings have the same start  offset.  (Space  could  have
+       been  saved by giving this only once, but it was decided to retain some
+       compatibility with the way pcre_exec() returns data,  even  though  the
        meaning of the strings is different.)


        The strings are returned in reverse order of length; that is, the long-
-       est  matching  string is given first. If there were too many matches to
-       fit into ovector, the yield of the function is zero, and the vector  is
-       filled  with  the  longest matches. Unlike pcre_exec(), pcre_dfa_exec()
+       est matching string is given first. If there were too many  matches  to
+       fit  into ovector, the yield of the function is zero, and the vector is
+       filled with the longest matches.  Unlike  pcre_exec(),  pcre_dfa_exec()
        can use the entire ovector for returning matched strings.


    Error returns from pcre_dfa_exec()


-       The pcre_dfa_exec() function returns a negative number when  it  fails.
-       Many  of  the  errors  are  the  same as for pcre_exec(), and these are
-       described above.  There are in addition the following errors  that  are
+       The  pcre_dfa_exec()  function returns a negative number when it fails.
+       Many of the errors are the same  as  for  pcre_exec(),  and  these  are
+       described  above.   There are in addition the following errors that are
        specific to pcre_dfa_exec():


          PCRE_ERROR_DFA_UITEM      (-16)


-       This  return is given if pcre_dfa_exec() encounters an item in the pat-
-       tern that it does not support, for instance, the use of \C  or  a  back
+       This return is given if pcre_dfa_exec() encounters an item in the  pat-
+       tern  that  it  does not support, for instance, the use of \C or a back
        reference.


          PCRE_ERROR_DFA_UCOND      (-17)


-       This  return  is  given  if pcre_dfa_exec() encounters a condition item
-       that uses a back reference for the condition, or a test  for  recursion
+       This return is given if pcre_dfa_exec()  encounters  a  condition  item
+       that  uses  a back reference for the condition, or a test for recursion
        in a specific group. These are not supported.


          PCRE_ERROR_DFA_UMLIMIT    (-18)


-       This  return  is given if pcre_dfa_exec() is called with an extra block
-       that contains a setting of  the  match_limit  or  match_limit_recursion
-       fields.  This  is  not  supported (these fields are meaningless for DFA
+       This return is given if pcre_dfa_exec() is called with an  extra  block
+       that  contains  a  setting  of the match_limit or match_limit_recursion
+       fields. This is not supported (these fields  are  meaningless  for  DFA
        matching).


          PCRE_ERROR_DFA_WSSIZE     (-19)


-       This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
+       This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
        workspace vector.


          PCRE_ERROR_DFA_RECURSE    (-20)


-       When  a  recursive subpattern is processed, the matching function calls
-       itself recursively, using private vectors for  ovector  and  workspace.
-       This  error  is  given  if  the output vector is not large enough. This
+       When a recursive subpattern is processed, the matching  function  calls
+       itself  recursively,  using  private vectors for ovector and workspace.
+       This error is given if the output vector  is  not  large  enough.  This
        should be extremely rare, as a vector of size 1000 is used.



SEE ALSO

-       pcre16(3),  pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),   pcrematch-
+       pcre16(3),   pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),  pcrematch-
        ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3),
        pcrestack(3).


@@ -3483,7 +3485,7 @@

REVISION

-       Last updated: 24 February 2012
+       Last updated: 14 April 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------


@@ -4697,16 +4699,17 @@
        means that the rest of the string may start with a malformed UTF  char-
        acter.  This  has  undefined  results,  because PCRE assumes that it is
        dealing with valid UTF strings (and by default it checks  this  at  the
-       start of processing unless the PCRE_NO_UTF8_CHECK option is used).
+       start     of    processing    unless    the    PCRE_NO_UTF8_CHECK    or
+       PCRE_NO_UTF16_CHECK option is used).


-       PCRE  does  not  allow \C to appear in lookbehind assertions (described
-       below) in a UTF mode, because this would make it impossible  to  calcu-
+       PCRE does not allow \C to appear in  lookbehind  assertions  (described
+       below)  in  a UTF mode, because this would make it impossible to calcu-
        late the length of the lookbehind.


        In general, the \C escape sequence is best avoided. However, one way of
-       using it that avoids the problem of malformed UTF characters is to  use
-       a  lookahead to check the length of the next character, as in this pat-
-       tern, which could be used with a UTF-8 string (ignore white  space  and
+       using  it that avoids the problem of malformed UTF characters is to use
+       a lookahead to check the length of the next character, as in this  pat-
+       tern,  which  could be used with a UTF-8 string (ignore white space and
        line breaks):


          (?| (?=[\x00-\x7f])(\C) |
@@ -4714,11 +4717,11 @@
              (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
              (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))


-       A  group  that starts with (?| resets the capturing parentheses numbers
-       in each alternative (see "Duplicate  Subpattern  Numbers"  below).  The
-       assertions  at  the start of each branch check the next UTF-8 character
-       for values whose encoding uses 1, 2, 3, or 4 bytes,  respectively.  The
-       character's  individual bytes are then captured by the appropriate num-
+       A group that starts with (?| resets the capturing  parentheses  numbers
+       in  each  alternative  (see  "Duplicate Subpattern Numbers" below). The
+       assertions at the start of each branch check the next  UTF-8  character
+       for  values  whose encoding uses 1, 2, 3, or 4 bytes, respectively. The
+       character's individual bytes are then captured by the appropriate  num-
        ber of groups.



@@ -4728,109 +4731,109 @@
        closing square bracket. A closing square bracket on its own is not spe-
        cial by default.  However, if the PCRE_JAVASCRIPT_COMPAT option is set,
        a lone closing square bracket causes a compile-time error. If a closing
-       square bracket is required as a member of the class, it should  be  the
-       first  data  character  in  the  class (after an initial circumflex, if
+       square  bracket  is required as a member of the class, it should be the
+       first data character in the class  (after  an  initial  circumflex,  if
        present) or escaped with a backslash.


-       A character class matches a single character in the subject. In  a  UTF
-       mode,  the  character  may  be  more than one data unit long. A matched
+       A  character  class matches a single character in the subject. In a UTF
+       mode, the character may be more than one  data  unit  long.  A  matched
        character must be in the set of characters defined by the class, unless
-       the  first  character in the class definition is a circumflex, in which
+       the first character in the class definition is a circumflex,  in  which
        case the subject character must not be in the set defined by the class.
-       If  a  circumflex is actually required as a member of the class, ensure
+       If a circumflex is actually required as a member of the  class,  ensure
        it is not the first character, or escape it with a backslash.


-       For example, the character class [aeiou] matches any lower case  vowel,
-       while  [^aeiou]  matches  any character that is not a lower case vowel.
+       For  example, the character class [aeiou] matches any lower case vowel,
+       while [^aeiou] matches any character that is not a  lower  case  vowel.
        Note that a circumflex is just a convenient notation for specifying the
-       characters  that  are in the class by enumerating those that are not. A
-       class that starts with a circumflex is not an assertion; it still  con-
-       sumes  a  character  from the subject string, and therefore it fails if
+       characters that are in the class by enumerating those that are  not.  A
+       class  that starts with a circumflex is not an assertion; it still con-
+       sumes a character from the subject string, and therefore  it  fails  if
        the current pointer is at the end of the string.


-       In UTF-8  (UTF-16)  mode,  characters  with  values  greater  than  255
-       (0xffff)  can be included in a class as a literal string of data units,
+       In  UTF-8  (UTF-16)  mode,  characters  with  values  greater  than 255
+       (0xffff) can be included in a class as a literal string of data  units,
        or by using the \x{ escaping mechanism.


-       When caseless matching is set, any letters in a  class  represent  both
-       their  upper  case  and lower case versions, so for example, a caseless
-       [aeiou] matches "A" as well as "a", and a caseless  [^aeiou]  does  not
-       match  "A", whereas a caseful version would. In a UTF mode, PCRE always
-       understands the concept of case for characters whose  values  are  less
-       than  128, so caseless matching is always possible. For characters with
-       higher values, the concept of case is supported  if  PCRE  is  compiled
-       with  Unicode  property support, but not otherwise.  If you want to use
-       caseless matching in a UTF mode for characters 128 and above, you  must
-       ensure  that  PCRE is compiled with Unicode property support as well as
+       When  caseless  matching  is set, any letters in a class represent both
+       their upper case and lower case versions, so for  example,  a  caseless
+       [aeiou]  matches  "A"  as well as "a", and a caseless [^aeiou] does not
+       match "A", whereas a caseful version would. In a UTF mode, PCRE  always
+       understands  the  concept  of case for characters whose values are less
+       than 128, so caseless matching is always possible. For characters  with
+       higher  values,  the  concept  of case is supported if PCRE is compiled
+       with Unicode property support, but not otherwise.  If you want  to  use
+       caseless  matching in a UTF mode for characters 128 and above, you must
+       ensure that PCRE is compiled with Unicode property support as  well  as
        with UTF support.


-       Characters that might indicate line breaks are  never  treated  in  any
-       special  way  when  matching  character  classes,  whatever line-ending
-       sequence is in  use,  and  whatever  setting  of  the  PCRE_DOTALL  and
+       Characters  that  might  indicate  line breaks are never treated in any
+       special way  when  matching  character  classes,  whatever  line-ending
+       sequence  is  in  use,  and  whatever  setting  of  the PCRE_DOTALL and
        PCRE_MULTILINE options is used. A class such as [^a] always matches one
        of these characters.


-       The minus (hyphen) character can be used to specify a range of  charac-
-       ters  in  a  character  class.  For  example,  [d-m] matches any letter
-       between d and m, inclusive. If a  minus  character  is  required  in  a
-       class,  it  must  be  escaped  with a backslash or appear in a position
-       where it cannot be interpreted as indicating a range, typically as  the
+       The  minus (hyphen) character can be used to specify a range of charac-
+       ters in a character  class.  For  example,  [d-m]  matches  any  letter
+       between  d  and  m,  inclusive.  If  a minus character is required in a
+       class, it must be escaped with a backslash  or  appear  in  a  position
+       where  it cannot be interpreted as indicating a range, typically as the
        first or last character in the class.


        It is not possible to have the literal character "]" as the end charac-
-       ter of a range. A pattern such as [W-]46] is interpreted as a class  of
-       two  characters ("W" and "-") followed by a literal string "46]", so it
-       would match "W46]" or "-46]". However, if the "]"  is  escaped  with  a
-       backslash  it is interpreted as the end of range, so [W-\]46] is inter-
-       preted as a class containing a range followed by two other  characters.
-       The  octal or hexadecimal representation of "]" can also be used to end
+       ter  of a range. A pattern such as [W-]46] is interpreted as a class of
+       two characters ("W" and "-") followed by a literal string "46]", so  it
+       would  match  "W46]"  or  "-46]". However, if the "]" is escaped with a
+       backslash it is interpreted as the end of range, so [W-\]46] is  inter-
+       preted  as a class containing a range followed by two other characters.
+       The octal or hexadecimal representation of "]" can also be used to  end
        a range.


-       Ranges operate in the collating sequence of character values. They  can
-       also   be  used  for  characters  specified  numerically,  for  example
-       [\000-\037]. Ranges can include any characters that are valid  for  the
+       Ranges  operate in the collating sequence of character values. They can
+       also  be  used  for  characters  specified  numerically,  for   example
+       [\000-\037].  Ranges  can include any characters that are valid for the
        current mode.


        If a range that includes letters is used when caseless matching is set,
        it matches the letters in either case. For example, [W-c] is equivalent
-       to  [][\\^_`wxyzabc],  matched  caselessly,  and  in a non-UTF mode, if
-       character tables for a French locale are in  use,  [\xc8-\xcb]  matches
-       accented  E  characters  in both cases. In UTF modes, PCRE supports the
-       concept of case for characters with values greater than 128  only  when
+       to [][\\^_`wxyzabc], matched caselessly, and  in  a  non-UTF  mode,  if
+       character  tables  for  a French locale are in use, [\xc8-\xcb] matches
+       accented E characters in both cases. In UTF modes,  PCRE  supports  the
+       concept  of  case for characters with values greater than 128 only when
        it is compiled with Unicode property support.


-       The  character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, \V,
+       The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,  \V,
        \w, and \W may appear in a character class, and add the characters that
-       they  match to the class. For example, [\dABCDEF] matches any hexadeci-
-       mal digit. In UTF modes, the PCRE_UCP option affects  the  meanings  of
-       \d,  \s,  \w  and  their upper case partners, just as it does when they
-       appear outside a character class, as described in the section  entitled
+       they match to the class. For example, [\dABCDEF] matches any  hexadeci-
+       mal  digit.  In  UTF modes, the PCRE_UCP option affects the meanings of
+       \d, \s, \w and their upper case partners, just as  it  does  when  they
+       appear  outside a character class, as described in the section entitled
        "Generic character types" above. The escape sequence \b has a different
-       meaning inside a character class; it matches the  backspace  character.
-       The  sequences  \B,  \N,  \R, and \X are not special inside a character
-       class. Like any other unrecognized escape sequences, they  are  treated
-       as  the literal characters "B", "N", "R", and "X" by default, but cause
+       meaning  inside  a character class; it matches the backspace character.
+       The sequences \B, \N, \R, and \X are not  special  inside  a  character
+       class.  Like  any other unrecognized escape sequences, they are treated
+       as the literal characters "B", "N", "R", and "X" by default, but  cause
        an error if the PCRE_EXTRA option is set.


-       A circumflex can conveniently be used with  the  upper  case  character
-       types  to specify a more restricted set of characters than the matching
-       lower case type.  For example, the class [^\W_] matches any  letter  or
+       A  circumflex  can  conveniently  be used with the upper case character
+       types to specify a more restricted set of characters than the  matching
+       lower  case  type.  For example, the class [^\W_] matches any letter or
        digit, but not underscore, whereas [\w] includes underscore. A positive
        character class should be read as "something OR something OR ..." and a
        negative class as "NOT something AND NOT something AND NOT ...".


-       The  only  metacharacters  that are recognized in character classes are
-       backslash, hyphen (only where it can be  interpreted  as  specifying  a
-       range),  circumflex  (only  at the start), opening square bracket (only
-       when it can be interpreted as introducing a POSIX class name - see  the
-       next  section),  and  the  terminating closing square bracket. However,
+       The only metacharacters that are recognized in  character  classes  are
+       backslash,  hyphen  (only  where  it can be interpreted as specifying a
+       range), circumflex (only at the start), opening  square  bracket  (only
+       when  it can be interpreted as introducing a POSIX class name - see the
+       next section), and the terminating  closing  square  bracket.  However,
        escaping other non-alphanumeric characters does no harm.



POSIX CHARACTER CLASSES

        Perl supports the POSIX notation for character classes. This uses names
-       enclosed  by  [: and :] within the enclosing square brackets. PCRE also
+       enclosed by [: and :] within the enclosing square brackets.  PCRE  also
        supports this notation. For example,


          [01[:alpha:]%]
@@ -4853,24 +4856,24 @@
          word     "word" characters (same as \w)
          xdigit   hexadecimal digits


-       The  "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
-       and space (32). Notice that this list includes the VT  character  (code
+       The "space" characters are HT (9), LF (10), VT (11), FF (12), CR  (13),
+       and  space  (32). Notice that this list includes the VT character (code
        11). This makes "space" different to \s, which does not include VT (for
        Perl compatibility).


-       The name "word" is a Perl extension, and "blank"  is  a  GNU  extension
-       from  Perl  5.8. Another Perl extension is negation, which is indicated
+       The  name  "word"  is  a Perl extension, and "blank" is a GNU extension
+       from Perl 5.8. Another Perl extension is negation, which  is  indicated
        by a ^ character after the colon. For example,


          [12[:^digit:]]


-       matches "1", "2", or any non-digit. PCRE (and Perl) also recognize  the
+       matches  "1", "2", or any non-digit. PCRE (and Perl) also recognize the
        POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
        these are not supported, and an error is given if they are encountered.


-       By default, in UTF modes, characters with values greater  than  128  do
-       not  match any of the POSIX character classes. However, if the PCRE_UCP
-       option is passed to pcre_compile(), some of the classes are changed  so
+       By  default,  in  UTF modes, characters with values greater than 128 do
+       not match any of the POSIX character classes. However, if the  PCRE_UCP
+       option  is passed to pcre_compile(), some of the classes are changed so
        that Unicode character properties are used. This is achieved by replac-
        ing the POSIX classes by other sequences, as follows:


@@ -4883,31 +4886,31 @@
          [:upper:]  becomes  \p{Lu}
          [:word:]   becomes  \p{Xwd}


-       Negated versions, such as [:^alpha:] use \P instead of  \p.  The  other
+       Negated  versions,  such  as [:^alpha:] use \P instead of \p. The other
        POSIX classes are unchanged, and match only characters with code points
        less than 128.



VERTICAL BAR

-       Vertical bar characters are used to separate alternative patterns.  For
+       Vertical  bar characters are used to separate alternative patterns. For
        example, the pattern


          gilbert|sullivan


-       matches  either "gilbert" or "sullivan". Any number of alternatives may
-       appear, and an empty  alternative  is  permitted  (matching  the  empty
+       matches either "gilbert" or "sullivan". Any number of alternatives  may
+       appear,  and  an  empty  alternative  is  permitted (matching the empty
        string). The matching process tries each alternative in turn, from left
-       to right, and the first one that succeeds is used. If the  alternatives
-       are  within a subpattern (defined below), "succeeds" means matching the
+       to  right, and the first one that succeeds is used. If the alternatives
+       are within a subpattern (defined below), "succeeds" means matching  the
        rest of the main pattern as well as the alternative in the subpattern.



INTERNAL OPTION SETTING

-       The settings of the  PCRE_CASELESS,  PCRE_MULTILINE,  PCRE_DOTALL,  and
-       PCRE_EXTENDED  options  (which are Perl-compatible) can be changed from
-       within the pattern by  a  sequence  of  Perl  option  letters  enclosed
+       The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
+       PCRE_EXTENDED options (which are Perl-compatible) can be  changed  from
+       within  the  pattern  by  a  sequence  of  Perl option letters enclosed
        between "(?" and ")".  The option letters are


          i  for PCRE_CASELESS
@@ -4917,48 +4920,48 @@


        For example, (?im) sets caseless, multiline matching. It is also possi-
        ble to unset these options by preceding the letter with a hyphen, and a
-       combined  setting and unsetting such as (?im-sx), which sets PCRE_CASE-
-       LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and  PCRE_EXTENDED,
-       is  also  permitted.  If  a  letter  appears  both before and after the
+       combined setting and unsetting such as (?im-sx), which sets  PCRE_CASE-
+       LESS  and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
+       is also permitted. If a  letter  appears  both  before  and  after  the
        hyphen, the option is unset.


-       The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and  PCRE_EXTRA
-       can  be changed in the same way as the Perl-compatible options by using
+       The  PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA
+       can be changed in the same way as the Perl-compatible options by  using
        the characters J, U and X respectively.


-       When one of these option changes occurs at  top  level  (that  is,  not
-       inside  subpattern parentheses), the change applies to the remainder of
+       When  one  of  these  option  changes occurs at top level (that is, not
+       inside subpattern parentheses), the change applies to the remainder  of
        the pattern that follows. If the change is placed right at the start of
        a pattern, PCRE extracts it into the global options (and it will there-
        fore show up in data extracted by the pcre_fullinfo() function).


-       An option change within a subpattern (see below for  a  description  of
-       subpatterns)  affects only that part of the subpattern that follows it,
+       An  option  change  within a subpattern (see below for a description of
+       subpatterns) affects only that part of the subpattern that follows  it,
        so


          (a(?i)b)c


        matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
-       used).   By  this means, options can be made to have different settings
-       in different parts of the pattern. Any changes made in one  alternative
-       do  carry  on  into subsequent branches within the same subpattern. For
+       used).  By this means, options can be made to have  different  settings
+       in  different parts of the pattern. Any changes made in one alternative
+       do carry on into subsequent branches within the  same  subpattern.  For
        example,


          (a(?i)b|c)


-       matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the
-       first  branch  is  abandoned before the option setting. This is because
-       the effects of option settings happen at compile time. There  would  be
+       matches  "ab",  "aB",  "c",  and "C", even though when matching "C" the
+       first branch is abandoned before the option setting.  This  is  because
+       the  effects  of option settings happen at compile time. There would be
        some very weird behaviour otherwise.


-       Note:  There  are  other  PCRE-specific  options that can be set by the
-       application when the compiling or matching  functions  are  called.  In
-       some  cases  the  pattern can contain special leading sequences such as
-       (*CRLF) to override what the application  has  set  or  what  has  been
-       defaulted.   Details   are  given  in  the  section  entitled  "Newline
-       sequences" above. There are also  the  (*UTF8),  (*UTF16),  and  (*UCP)
-       leading  sequences  that  can  be  used to set UTF and Unicode property
-       modes; they are equivalent to setting the  PCRE_UTF8,  PCRE_UTF16,  and
+       Note: There are other PCRE-specific options that  can  be  set  by  the
+       application  when  the  compiling  or matching functions are called. In
+       some cases the pattern can contain special leading  sequences  such  as
+       (*CRLF)  to  override  what  the  application  has set or what has been
+       defaulted.  Details  are  given  in  the  section   entitled   "Newline
+       sequences"  above.  There  are  also  the (*UTF8), (*UTF16), and (*UCP)
+       leading sequences that can be used to  set  UTF  and  Unicode  property
+       modes;  they  are  equivalent to setting the PCRE_UTF8, PCRE_UTF16, and
        the PCRE_UCP options, respectively.



@@ -4971,18 +4974,18 @@

          cat(aract|erpillar|)


-       matches "cataract", "caterpillar", or "cat". Without  the  parentheses,
+       matches  "cataract",  "caterpillar", or "cat". Without the parentheses,
        it would match "cataract", "erpillar" or an empty string.


-       2.  It  sets  up  the  subpattern as a capturing subpattern. This means
-       that, when the whole pattern  matches,  that  portion  of  the  subject
+       2. It sets up the subpattern as  a  capturing  subpattern.  This  means
+       that,  when  the  whole  pattern  matches,  that portion of the subject
        string that matched the subpattern is passed back to the caller via the
-       ovector argument of the matching function. (This applies  only  to  the
-       traditional  matching functions; the DFA matching functions do not sup-
+       ovector  argument  of  the matching function. (This applies only to the
+       traditional matching functions; the DFA matching functions do not  sup-
        port capturing.)


        Opening parentheses are counted from left to right (starting from 1) to
-       obtain  numbers  for  the  capturing  subpatterns.  For example, if the
+       obtain numbers for the  capturing  subpatterns.  For  example,  if  the
        string "the red king" is matched against the pattern


          the ((red|white) (king|queen))
@@ -4990,12 +4993,12 @@
        the captured substrings are "red king", "red", and "king", and are num-
        bered 1, 2, and 3, respectively.


-       The  fact  that  plain  parentheses  fulfil two functions is not always
-       helpful.  There are often times when a grouping subpattern is  required
-       without  a capturing requirement. If an opening parenthesis is followed
-       by a question mark and a colon, the subpattern does not do any  captur-
-       ing,  and  is  not  counted when computing the number of any subsequent
-       capturing subpatterns. For example, if the string "the white queen"  is
+       The fact that plain parentheses fulfil  two  functions  is  not  always
+       helpful.   There are often times when a grouping subpattern is required
+       without a capturing requirement. If an opening parenthesis is  followed
+       by  a question mark and a colon, the subpattern does not do any captur-
+       ing, and is not counted when computing the  number  of  any  subsequent
+       capturing  subpatterns. For example, if the string "the white queen" is
        matched against the pattern


          the ((?:red|white) (king|queen))
@@ -5003,37 +5006,37 @@
        the captured substrings are "white queen" and "queen", and are numbered
        1 and 2. The maximum number of capturing subpatterns is 65535.


-       As a convenient shorthand, if any option settings are required  at  the
-       start  of  a  non-capturing  subpattern,  the option letters may appear
+       As  a  convenient shorthand, if any option settings are required at the
+       start of a non-capturing subpattern,  the  option  letters  may  appear
        between the "?" and the ":". Thus the two patterns


          (?i:saturday|sunday)
          (?:(?i)saturday|sunday)


        match exactly the same set of strings. Because alternative branches are
-       tried  from  left  to right, and options are not reset until the end of
-       the subpattern is reached, an option setting in one branch does  affect
-       subsequent  branches,  so  the above patterns match "SUNDAY" as well as
+       tried from left to right, and options are not reset until  the  end  of
+       the  subpattern is reached, an option setting in one branch does affect
+       subsequent branches, so the above patterns match "SUNDAY"  as  well  as
        "Saturday".



DUPLICATE SUBPATTERN NUMBERS

        Perl 5.10 introduced a feature whereby each alternative in a subpattern
-       uses  the same numbers for its capturing parentheses. Such a subpattern
-       starts with (?| and is itself a non-capturing subpattern. For  example,
+       uses the same numbers for its capturing parentheses. Such a  subpattern
+       starts  with (?| and is itself a non-capturing subpattern. For example,
        consider this pattern:


          (?|(Sat)ur|(Sun))day


-       Because  the two alternatives are inside a (?| group, both sets of cap-
-       turing parentheses are numbered one. Thus, when  the  pattern  matches,
-       you  can  look  at captured substring number one, whichever alternative
-       matched. This construct is useful when you want to  capture  part,  but
+       Because the two alternatives are inside a (?| group, both sets of  cap-
+       turing  parentheses  are  numbered one. Thus, when the pattern matches,
+       you can look at captured substring number  one,  whichever  alternative
+       matched.  This  construct  is useful when you want to capture part, but
        not all, of one of a number of alternatives. Inside a (?| group, paren-
-       theses are numbered as usual, but the number is reset at the  start  of
-       each  branch.  The numbers of any capturing parentheses that follow the
-       subpattern start after the highest number used in any branch. The  fol-
+       theses  are  numbered as usual, but the number is reset at the start of
+       each branch. The numbers of any capturing parentheses that  follow  the
+       subpattern  start after the highest number used in any branch. The fol-
        lowing example is taken from the Perl documentation. The numbers under-
        neath show in which buffer the captured content will be stored.


@@ -5041,58 +5044,58 @@
          / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
          # 1            2         2  3        2     3     4


-       A back reference to a numbered subpattern uses the  most  recent  value
-       that  is  set  for that number by any subpattern. The following pattern
+       A  back  reference  to a numbered subpattern uses the most recent value
+       that is set for that number by any subpattern.  The  following  pattern
        matches "abcabc" or "defdef":


          /(?|(abc)|(def))\1/


-       In contrast, a subroutine call to a numbered subpattern  always  refers
-       to  the  first  one in the pattern with the given number. The following
+       In  contrast,  a subroutine call to a numbered subpattern always refers
+       to the first one in the pattern with the given  number.  The  following
        pattern matches "abcabc" or "defabc":


          /(?|(abc)|(def))(?1)/


-       If a condition test for a subpattern's having matched refers to a  non-
-       unique  number, the test is true if any of the subpatterns of that num-
+       If  a condition test for a subpattern's having matched refers to a non-
+       unique number, the test is true if any of the subpatterns of that  num-
        ber have matched.


-       An alternative approach to using this "branch reset" feature is to  use
+       An  alternative approach to using this "branch reset" feature is to use
        duplicate named subpatterns, as described in the next section.



NAMED SUBPATTERNS

-       Identifying  capturing  parentheses  by number is simple, but it can be
-       very hard to keep track of the numbers in complicated  regular  expres-
-       sions.  Furthermore,  if  an  expression  is  modified, the numbers may
-       change. To help with this difficulty, PCRE supports the naming of  sub-
+       Identifying capturing parentheses by number is simple, but  it  can  be
+       very  hard  to keep track of the numbers in complicated regular expres-
+       sions. Furthermore, if an  expression  is  modified,  the  numbers  may
+       change.  To help with this difficulty, PCRE supports the naming of sub-
        patterns. This feature was not added to Perl until release 5.10. Python
-       had the feature earlier, and PCRE introduced it at release  4.0,  using
-       the  Python syntax. PCRE now supports both the Perl and the Python syn-
-       tax. Perl allows identically numbered  subpatterns  to  have  different
+       had  the  feature earlier, and PCRE introduced it at release 4.0, using
+       the Python syntax. PCRE now supports both the Perl and the Python  syn-
+       tax.  Perl  allows  identically  numbered subpatterns to have different
        names, but PCRE does not.


-       In  PCRE,  a subpattern can be named in one of three ways: (?<name>...)
-       or (?'name'...) as in Perl, or (?P<name>...) as in  Python.  References
-       to  capturing parentheses from other parts of the pattern, such as back
-       references, recursion, and conditions, can be made by name as  well  as
+       In PCRE, a subpattern can be named in one of three  ways:  (?<name>...)
+       or  (?'name'...)  as in Perl, or (?P<name>...) as in Python. References
+       to capturing parentheses from other parts of the pattern, such as  back
+       references,  recursion,  and conditions, can be made by name as well as
        by number.


-       Names  consist  of  up  to  32 alphanumeric characters and underscores.
-       Named capturing parentheses are still  allocated  numbers  as  well  as
-       names,  exactly as if the names were not present. The PCRE API provides
+       Names consist of up to  32  alphanumeric  characters  and  underscores.
+       Named  capturing  parentheses  are  still  allocated numbers as well as
+       names, exactly as if the names were not present. The PCRE API  provides
        function calls for extracting the name-to-number translation table from
        a compiled pattern. There is also a convenience function for extracting
        a captured substring by name.


-       By default, a name must be unique within a pattern, but it is  possible
+       By  default, a name must be unique within a pattern, but it is possible
        to relax this constraint by setting the PCRE_DUPNAMES option at compile
-       time. (Duplicate names are also always permitted for  subpatterns  with
-       the  same  number, set up as described in the previous section.) Dupli-
-       cate names can be useful for patterns where only one  instance  of  the
-       named  parentheses  can  match. Suppose you want to match the name of a
-       weekday, either as a 3-letter abbreviation or as the full name, and  in
+       time.  (Duplicate  names are also always permitted for subpatterns with
+       the same number, set up as described in the previous  section.)  Dupli-
+       cate  names  can  be useful for patterns where only one instance of the
+       named parentheses can match. Suppose you want to match the  name  of  a
+       weekday,  either as a 3-letter abbreviation or as the full name, and in
        both cases you want to extract the abbreviation. This pattern (ignoring
        the line breaks) does the job:


@@ -5102,38 +5105,38 @@
          (?<DN>Thu)(?:rsday)?|
          (?<DN>Sat)(?:urday)?


-       There are five capturing substrings, but only one is ever set  after  a
+       There  are  five capturing substrings, but only one is ever set after a
        match.  (An alternative way of solving this problem is to use a "branch
        reset" subpattern, as described in the previous section.)


-       The convenience function for extracting the data by  name  returns  the
-       substring  for  the first (and in this example, the only) subpattern of
-       that name that matched. This saves searching  to  find  which  numbered
+       The  convenience  function  for extracting the data by name returns the
+       substring for the first (and in this example, the only)  subpattern  of
+       that  name  that  matched.  This saves searching to find which numbered
        subpattern it was.


-       If  you  make  a  back  reference to a non-unique named subpattern from
-       elsewhere in the pattern, the one that corresponds to the first  occur-
+       If you make a back reference to  a  non-unique  named  subpattern  from
+       elsewhere  in the pattern, the one that corresponds to the first occur-
        rence of the name is used. In the absence of duplicate numbers (see the
-       previous section) this is the one with the lowest number. If you use  a
-       named  reference  in a condition test (see the section about conditions
-       below), either to check whether a subpattern has matched, or  to  check
-       for  recursion,  all  subpatterns with the same name are tested. If the
-       condition is true for any one of them, the overall condition  is  true.
+       previous  section) this is the one with the lowest number. If you use a
+       named reference in a condition test (see the section  about  conditions
+       below),  either  to check whether a subpattern has matched, or to check
+       for recursion, all subpatterns with the same name are  tested.  If  the
+       condition  is  true for any one of them, the overall condition is true.
        This is the same behaviour as testing by number. For further details of
        the interfaces for handling named subpatterns, see the pcreapi documen-
        tation.


        Warning: You cannot use different names to distinguish between two sub-
-       patterns with the same number because PCRE uses only the  numbers  when
+       patterns  with  the same number because PCRE uses only the numbers when
        matching. For this reason, an error is given at compile time if differ-
-       ent names are given to subpatterns with the same number.  However,  you
-       can  give  the same name to subpatterns with the same number, even when
+       ent  names  are given to subpatterns with the same number. However, you
+       can give the same name to subpatterns with the same number,  even  when
        PCRE_DUPNAMES is not set.



REPETITION

-       Repetition is specified by quantifiers, which can  follow  any  of  the
+       Repetition  is  specified  by  quantifiers, which can follow any of the
        following items:


          a literal data character
@@ -5147,17 +5150,17 @@
          a parenthesized subpattern (including assertions)
          a subroutine call to a subpattern (recursive or otherwise)


-       The  general repetition quantifier specifies a minimum and maximum num-
-       ber of permitted matches, by giving the two numbers in  curly  brackets
-       (braces),  separated  by  a comma. The numbers must be less than 65536,
+       The general repetition quantifier specifies a minimum and maximum  num-
+       ber  of  permitted matches, by giving the two numbers in curly brackets
+       (braces), separated by a comma. The numbers must be  less  than  65536,
        and the first must be less than or equal to the second. For example:


          z{2,4}


-       matches "zz", "zzz", or "zzzz". A closing brace on its  own  is  not  a
-       special  character.  If  the second number is omitted, but the comma is
-       present, there is no upper limit; if the second number  and  the  comma
-       are  both omitted, the quantifier specifies an exact number of required
+       matches  "zz",  "zzz",  or  "zzzz". A closing brace on its own is not a
+       special character. If the second number is omitted, but  the  comma  is
+       present,  there  is  no upper limit; if the second number and the comma
+       are both omitted, the quantifier specifies an exact number of  required
        matches. Thus


          [aeiou]{3,}
@@ -5166,49 +5169,49 @@


          \d{8}


-       matches exactly 8 digits. An opening curly bracket that  appears  in  a
-       position  where a quantifier is not allowed, or one that does not match
-       the syntax of a quantifier, is taken as a literal character. For  exam-
+       matches  exactly  8  digits. An opening curly bracket that appears in a
+       position where a quantifier is not allowed, or one that does not  match
+       the  syntax of a quantifier, is taken as a literal character. For exam-
        ple, {,6} is not a quantifier, but a literal string of four characters.


        In UTF modes, quantifiers apply to characters rather than to individual
-       data units. Thus, for example, \x{100}{2} matches two characters,  each
+       data  units. Thus, for example, \x{100}{2} matches two characters, each
        of which is represented by a two-byte sequence in a UTF-8 string. Simi-
-       larly, \X{3} matches three Unicode extended sequences,  each  of  which
+       larly,  \X{3}  matches  three Unicode extended sequences, each of which
        may be several data units long (and they may be of different lengths).


        The quantifier {0} is permitted, causing the expression to behave as if
        the previous item and the quantifier were not present. This may be use-
-       ful  for  subpatterns that are referenced as subroutines from elsewhere
+       ful for subpatterns that are referenced as subroutines  from  elsewhere
        in the pattern (but see also the section entitled "Defining subpatterns
-       for  use  by  reference only" below). Items other than subpatterns that
+       for use by reference only" below). Items other  than  subpatterns  that
        have a {0} quantifier are omitted from the compiled pattern.


-       For convenience, the three most common quantifiers have  single-charac-
+       For  convenience, the three most common quantifiers have single-charac-
        ter abbreviations:


          *    is equivalent to {0,}
          +    is equivalent to {1,}
          ?    is equivalent to {0,1}


-       It  is  possible  to construct infinite loops by following a subpattern
+       It is possible to construct infinite loops by  following  a  subpattern
        that can match no characters with a quantifier that has no upper limit,
        for example:


          (a?)*


        Earlier versions of Perl and PCRE used to give an error at compile time
-       for such patterns. However, because there are cases where this  can  be
-       useful,  such  patterns  are now accepted, but if any repetition of the
-       subpattern does in fact match no characters, the loop is forcibly  bro-
+       for  such  patterns. However, because there are cases where this can be
+       useful, such patterns are now accepted, but if any  repetition  of  the
+       subpattern  does in fact match no characters, the loop is forcibly bro-
        ken.


-       By  default,  the quantifiers are "greedy", that is, they match as much
-       as possible (up to the maximum  number  of  permitted  times),  without
-       causing  the  rest of the pattern to fail. The classic example of where
+       By default, the quantifiers are "greedy", that is, they match  as  much
+       as  possible  (up  to  the  maximum number of permitted times), without
+       causing the rest of the pattern to fail. The classic example  of  where
        this gives problems is in trying to match comments in C programs. These
-       appear  between  /*  and  */ and within the comment, individual * and /
-       characters may appear. An attempt to match C comments by  applying  the
+       appear between /* and */ and within the comment,  individual  *  and  /
+       characters  may  appear. An attempt to match C comments by applying the
        pattern


          /\*.*\*/
@@ -5217,19 +5220,19 @@


          /* first comment */  not comment  /* second comment */


-       fails,  because it matches the entire string owing to the greediness of
+       fails, because it matches the entire string owing to the greediness  of
        the .*  item.


-       However, if a quantifier is followed by a question mark, it  ceases  to
+       However,  if  a quantifier is followed by a question mark, it ceases to
        be greedy, and instead matches the minimum number of times possible, so
        the pattern


          /\*.*?\*/


-       does the right thing with the C comments. The meaning  of  the  various
-       quantifiers  is  not  otherwise  changed,  just the preferred number of
-       matches.  Do not confuse this use of question mark with its  use  as  a
-       quantifier  in its own right. Because it has two uses, it can sometimes
+       does  the  right  thing with the C comments. The meaning of the various
+       quantifiers is not otherwise changed,  just  the  preferred  number  of
+       matches.   Do  not  confuse this use of question mark with its use as a
+       quantifier in its own right. Because it has two uses, it can  sometimes
        appear doubled, as in


          \d??\d
@@ -5237,36 +5240,36 @@
        which matches one digit by preference, but can match two if that is the
        only way the rest of the pattern matches.


-       If  the PCRE_UNGREEDY option is set (an option that is not available in
-       Perl), the quantifiers are not greedy by default, but  individual  ones
-       can  be  made  greedy  by following them with a question mark. In other
+       If the PCRE_UNGREEDY option is set (an option that is not available  in
+       Perl),  the  quantifiers are not greedy by default, but individual ones
+       can be made greedy by following them with a  question  mark.  In  other
        words, it inverts the default behaviour.


-       When a parenthesized subpattern is quantified  with  a  minimum  repeat
-       count  that is greater than 1 or with a limited maximum, more memory is
-       required for the compiled pattern, in proportion to  the  size  of  the
+       When  a  parenthesized  subpattern  is quantified with a minimum repeat
+       count that is greater than 1 or with a limited maximum, more memory  is
+       required  for  the  compiled  pattern, in proportion to the size of the
        minimum or maximum.


        If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv-
-       alent to Perl's /s) is set, thus allowing the dot  to  match  newlines,
-       the  pattern  is  implicitly anchored, because whatever follows will be
-       tried against every character position in the subject string, so  there
-       is  no  point  in  retrying the overall match at any position after the
-       first. PCRE normally treats such a pattern as though it  were  preceded
+       alent  to  Perl's  /s) is set, thus allowing the dot to match newlines,
+       the pattern is implicitly anchored, because whatever  follows  will  be
+       tried  against every character position in the subject string, so there
+       is no point in retrying the overall match at  any  position  after  the
+       first.  PCRE  normally treats such a pattern as though it were preceded
        by \A.


-       In  cases  where  it  is known that the subject string contains no new-
-       lines, it is worth setting PCRE_DOTALL in order to  obtain  this  opti-
+       In cases where it is known that the subject  string  contains  no  new-
+       lines,  it  is  worth setting PCRE_DOTALL in order to obtain this opti-
        mization, or alternatively using ^ to indicate anchoring explicitly.


-       However,  there is one situation where the optimization cannot be used.
+       However, there is one situation where the optimization cannot be  used.
        When .*  is inside capturing parentheses that are the subject of a back
        reference elsewhere in the pattern, a match at the start may fail where
        a later one succeeds. Consider, for example:


          (.*)abc\1


-       If the subject is "xyz123abc123" the match point is the fourth  charac-
+       If  the subject is "xyz123abc123" the match point is the fourth charac-
        ter. For this reason, such a pattern is not implicitly anchored.


        When a capturing subpattern is repeated, the value captured is the sub-
@@ -5275,8 +5278,8 @@
          (tweedle[dume]{3}\s*)+


        has matched "tweedledum tweedledee" the value of the captured substring
-       is  "tweedledee".  However,  if there are nested capturing subpatterns,
-       the corresponding captured values may have been set in previous  itera-
+       is "tweedledee". However, if there are  nested  capturing  subpatterns,
+       the  corresponding captured values may have been set in previous itera-
        tions. For example, after


          /(a|(b))+/
@@ -5286,53 +5289,53 @@


ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS

-       With  both  maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
-       repetition, failure of what follows normally causes the  repeated  item
-       to  be  re-evaluated to see if a different number of repeats allows the
-       rest of the pattern to match. Sometimes it is useful to  prevent  this,
-       either  to  change the nature of the match, or to cause it fail earlier
-       than it otherwise might, when the author of the pattern knows there  is
+       With both maximizing ("greedy") and minimizing ("ungreedy"  or  "lazy")
+       repetition,  failure  of what follows normally causes the repeated item
+       to be re-evaluated to see if a different number of repeats  allows  the
+       rest  of  the pattern to match. Sometimes it is useful to prevent this,
+       either to change the nature of the match, or to cause it  fail  earlier
+       than  it otherwise might, when the author of the pattern knows there is
        no point in carrying on.


-       Consider,  for  example, the pattern \d+foo when applied to the subject
+       Consider, for example, the pattern \d+foo when applied to  the  subject
        line


          123456bar


        After matching all 6 digits and then failing to match "foo", the normal
-       action  of  the matcher is to try again with only 5 digits matching the
-       \d+ item, and then with  4,  and  so  on,  before  ultimately  failing.
-       "Atomic  grouping"  (a  term taken from Jeffrey Friedl's book) provides
-       the means for specifying that once a subpattern has matched, it is  not
+       action of the matcher is to try again with only 5 digits  matching  the
+       \d+  item,  and  then  with  4,  and  so on, before ultimately failing.
+       "Atomic grouping" (a term taken from Jeffrey  Friedl's  book)  provides
+       the  means for specifying that once a subpattern has matched, it is not
        to be re-evaluated in this way.


-       If  we  use atomic grouping for the previous example, the matcher gives
-       up immediately on failing to match "foo" the first time.  The  notation
+       If we use atomic grouping for the previous example, the  matcher  gives
+       up  immediately  on failing to match "foo" the first time. The notation
        is a kind of special parenthesis, starting with (?> as in this example:


          (?>\d+)foo


-       This  kind  of  parenthesis "locks up" the  part of the pattern it con-
-       tains once it has matched, and a failure further into  the  pattern  is
-       prevented  from  backtracking into it. Backtracking past it to previous
+       This kind of parenthesis "locks up" the  part of the  pattern  it  con-
+       tains  once  it  has matched, and a failure further into the pattern is
+       prevented from backtracking into it. Backtracking past it  to  previous
        items, however, works as normal.


-       An alternative description is that a subpattern of  this  type  matches
-       the  string  of  characters  that an identical standalone pattern would
+       An  alternative  description  is that a subpattern of this type matches
+       the string of characters that an  identical  standalone  pattern  would
        match, if anchored at the current point in the subject string.


        Atomic grouping subpatterns are not capturing subpatterns. Simple cases
        such as the above example can be thought of as a maximizing repeat that
-       must swallow everything it can. So, while both \d+ and  \d+?  are  pre-
-       pared  to  adjust  the number of digits they match in order to make the
+       must  swallow  everything  it can. So, while both \d+ and \d+? are pre-
+       pared to adjust the number of digits they match in order  to  make  the
        rest of the pattern match, (?>\d+) can only match an entire sequence of
        digits.


-       Atomic  groups in general can of course contain arbitrarily complicated
-       subpatterns, and can be nested. However, when  the  subpattern  for  an
+       Atomic groups in general can of course contain arbitrarily  complicated
+       subpatterns,  and  can  be  nested. However, when the subpattern for an
        atomic group is just a single repeated item, as in the example above, a
-       simpler notation, called a "possessive quantifier" can  be  used.  This
-       consists  of  an  additional  + character following a quantifier. Using
+       simpler  notation,  called  a "possessive quantifier" can be used. This
+       consists of an additional + character  following  a  quantifier.  Using
        this notation, the previous example can be rewritten as


          \d++foo
@@ -5342,45 +5345,45 @@


          (abc|xyz){2,3}+


-       Possessive   quantifiers   are   always  greedy;  the  setting  of  the
+       Possessive  quantifiers  are  always  greedy;  the   setting   of   the
        PCRE_UNGREEDY option is ignored. They are a convenient notation for the
-       simpler  forms  of atomic group. However, there is no difference in the
-       meaning of a possessive quantifier and  the  equivalent  atomic  group,
-       though  there  may  be a performance difference; possessive quantifiers
+       simpler forms of atomic group. However, there is no difference  in  the
+       meaning  of  a  possessive  quantifier and the equivalent atomic group,
+       though there may be a performance  difference;  possessive  quantifiers
        should be slightly faster.


-       The possessive quantifier syntax is an extension to the Perl  5.8  syn-
-       tax.   Jeffrey  Friedl  originated the idea (and the name) in the first
+       The  possessive  quantifier syntax is an extension to the Perl 5.8 syn-
+       tax.  Jeffrey Friedl originated the idea (and the name)  in  the  first
        edition of his book. Mike McCloskey liked it, so implemented it when he
-       built  Sun's Java package, and PCRE copied it from there. It ultimately
+       built Sun's Java package, and PCRE copied it from there. It  ultimately
        found its way into Perl at release 5.10.


        PCRE has an optimization that automatically "possessifies" certain sim-
-       ple  pattern  constructs.  For  example, the sequence A+B is treated as
-       A++B because there is no point in backtracking into a sequence  of  A's
+       ple pattern constructs. For example, the sequence  A+B  is  treated  as
+       A++B  because  there is no point in backtracking into a sequence of A's
        when B must follow.


-       When  a  pattern  contains an unlimited repeat inside a subpattern that
-       can itself be repeated an unlimited number of  times,  the  use  of  an
-       atomic  group  is  the  only way to avoid some failing matches taking a
+       When a pattern contains an unlimited repeat inside  a  subpattern  that
+       can  itself  be  repeated  an  unlimited number of times, the use of an
+       atomic group is the only way to avoid some  failing  matches  taking  a
        very long time indeed. The pattern


          (\D+|<\d+>)*[!?]


-       matches an unlimited number of substrings that either consist  of  non-
-       digits,  or  digits  enclosed in <>, followed by either ! or ?. When it
+       matches  an  unlimited number of substrings that either consist of non-
+       digits, or digits enclosed in <>, followed by either ! or  ?.  When  it
        matches, it runs quickly. However, if it is applied to


          aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa


-       it takes a long time before reporting  failure.  This  is  because  the
-       string  can be divided between the internal \D+ repeat and the external
-       * repeat in a large number of ways, and all  have  to  be  tried.  (The
-       example  uses  [!?]  rather than a single character at the end, because
-       both PCRE and Perl have an optimization that allows  for  fast  failure
-       when  a single character is used. They remember the last single charac-
-       ter that is required for a match, and fail early if it is  not  present
-       in  the  string.)  If  the pattern is changed so that it uses an atomic
+       it  takes  a  long  time  before reporting failure. This is because the
+       string can be divided between the internal \D+ repeat and the  external
+       *  repeat  in  a  large  number of ways, and all have to be tried. (The
+       example uses [!?] rather than a single character at  the  end,  because
+       both  PCRE  and  Perl have an optimization that allows for fast failure
+       when a single character is used. They remember the last single  charac-
+       ter  that  is required for a match, and fail early if it is not present
+       in the string.) If the pattern is changed so that  it  uses  an  atomic
        group, like this:


          ((?>\D+)|<\d+>)*[!?]
@@ -5392,28 +5395,28 @@


        Outside a character class, a backslash followed by a digit greater than
        0 (and possibly further digits) is a back reference to a capturing sub-
-       pattern earlier (that is, to its left) in the pattern,  provided  there
+       pattern  earlier  (that is, to its left) in the pattern, provided there
        have been that many previous capturing left parentheses.


        However, if the decimal number following the backslash is less than 10,
-       it is always taken as a back reference, and causes  an  error  only  if
-       there  are  not that many capturing left parentheses in the entire pat-
-       tern. In other words, the parentheses that are referenced need  not  be
-       to  the left of the reference for numbers less than 10. A "forward back
-       reference" of this type can make sense when a  repetition  is  involved
-       and  the  subpattern to the right has participated in an earlier itera-
+       it  is  always  taken  as a back reference, and causes an error only if
+       there are not that many capturing left parentheses in the  entire  pat-
+       tern.  In  other words, the parentheses that are referenced need not be
+       to the left of the reference for numbers less than 10. A "forward  back
+       reference"  of  this  type can make sense when a repetition is involved
+       and the subpattern to the right has participated in an  earlier  itera-
        tion.


-       It is not possible to have a numerical "forward back  reference"  to  a
-       subpattern  whose  number  is  10  or  more using this syntax because a
-       sequence such as \50 is interpreted as a character  defined  in  octal.
+       It  is  not  possible to have a numerical "forward back reference" to a
+       subpattern whose number is 10 or  more  using  this  syntax  because  a
+       sequence  such  as  \50 is interpreted as a character defined in octal.
        See the subsection entitled "Non-printing characters" above for further
-       details of the handling of digits following a backslash.  There  is  no
-       such  problem  when named parentheses are used. A back reference to any
+       details  of  the  handling of digits following a backslash. There is no
+       such problem when named parentheses are used. A back reference  to  any
        subpattern is possible using named parentheses (see below).


-       Another way of avoiding the ambiguity inherent in  the  use  of  digits
-       following  a  backslash  is  to use the \g escape sequence. This escape
+       Another  way  of  avoiding  the ambiguity inherent in the use of digits
+       following a backslash is to use the \g  escape  sequence.  This  escape
        must be followed by an unsigned number or a negative number, optionally
        enclosed in braces. These examples are all identical:


@@ -5421,7 +5424,7 @@
          (ring), \g1
          (ring), \g{1}


-       An  unsigned number specifies an absolute reference without the ambigu-
+       An unsigned number specifies an absolute reference without the  ambigu-
        ity that is present in the older syntax. It is also useful when literal
        digits follow the reference. A negative number is a relative reference.
        Consider this example:
@@ -5430,33 +5433,33 @@


        The sequence \g{-1} is a reference to the most recently started captur-
        ing subpattern before \g, that is, is it equivalent to \2 in this exam-
-       ple.  Similarly, \g{-2} would be equivalent to \1. The use of  relative
-       references  can  be helpful in long patterns, and also in patterns that
-       are created by  joining  together  fragments  that  contain  references
+       ple.   Similarly, \g{-2} would be equivalent to \1. The use of relative
+       references can be helpful in long patterns, and also in  patterns  that
+       are  created  by  joining  together  fragments  that contain references
        within themselves.


-       A  back  reference matches whatever actually matched the capturing sub-
-       pattern in the current subject string, rather  than  anything  matching
+       A back reference matches whatever actually matched the  capturing  sub-
+       pattern  in  the  current subject string, rather than anything matching
        the subpattern itself (see "Subpatterns as subroutines" below for a way
        of doing that). So the pattern


          (sens|respons)e and \1ibility


-       matches "sense and sensibility" and "response and responsibility",  but
-       not  "sense and responsibility". If caseful matching is in force at the
-       time of the back reference, the case of letters is relevant. For  exam-
+       matches  "sense and sensibility" and "response and responsibility", but
+       not "sense and responsibility". If caseful matching is in force at  the
+       time  of the back reference, the case of letters is relevant. For exam-
        ple,


          ((?i)rah)\s+\1


-       matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the
+       matches "rah rah" and "RAH RAH", but not "RAH  rah",  even  though  the
        original capturing subpattern is matched caselessly.


-       There are several different ways of writing back  references  to  named
-       subpatterns.  The  .NET syntax \k{name} and the Perl syntax \k<name> or
-       \k'name' are supported, as is the Python syntax (?P=name). Perl  5.10's
+       There  are  several  different ways of writing back references to named
+       subpatterns. The .NET syntax \k{name} and the Perl syntax  \k<name>  or
+       \k'name'  are supported, as is the Python syntax (?P=name). Perl 5.10's
        unified back reference syntax, in which \g can be used for both numeric
-       and named references, is also supported. We  could  rewrite  the  above
+       and  named  references,  is  also supported. We could rewrite the above
        example in any of the following ways:


          (?<p1>(?i)rah)\s+\k<p1>
@@ -5464,83 +5467,83 @@
          (?P<p1>(?i)rah)\s+(?P=p1)
          (?<p1>(?i)rah)\s+\g{p1}


-       A  subpattern  that  is  referenced  by  name may appear in the pattern
+       A subpattern that is referenced by  name  may  appear  in  the  pattern
        before or after the reference.


-       There may be more than one back reference to the same subpattern. If  a
-       subpattern  has  not actually been used in a particular match, any back
+       There  may be more than one back reference to the same subpattern. If a
+       subpattern has not actually been used in a particular match,  any  back
        references to it always fail by default. For example, the pattern


          (a|(bc))\2


-       always fails if it starts to match "a" rather than  "bc".  However,  if
+       always  fails  if  it starts to match "a" rather than "bc". However, if
        the PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back refer-
        ence to an unset value matches an empty string.


-       Because there may be many capturing parentheses in a pattern, all  dig-
-       its  following a backslash are taken as part of a potential back refer-
-       ence number.  If the pattern continues with  a  digit  character,  some
-       delimiter  must  be  used  to  terminate  the  back  reference.  If the
+       Because  there may be many capturing parentheses in a pattern, all dig-
+       its following a backslash are taken as part of a potential back  refer-
+       ence  number.   If  the  pattern continues with a digit character, some
+       delimiter must  be  used  to  terminate  the  back  reference.  If  the
        PCRE_EXTENDED option is set, this can be whitespace. Otherwise, the \g{
        syntax or an empty comment (see "Comments" below) can be used.


    Recursive back references


-       A  back reference that occurs inside the parentheses to which it refers
-       fails when the subpattern is first used, so, for example,  (a\1)  never
-       matches.   However,  such references can be useful inside repeated sub-
+       A back reference that occurs inside the parentheses to which it  refers
+       fails  when  the subpattern is first used, so, for example, (a\1) never
+       matches.  However, such references can be useful inside  repeated  sub-
        patterns. For example, the pattern


          (a|b\1)+


        matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
-       ation  of  the  subpattern,  the  back  reference matches the character
-       string corresponding to the previous iteration. In order  for  this  to
-       work,  the  pattern must be such that the first iteration does not need
-       to match the back reference. This can be done using alternation, as  in
+       ation of the subpattern,  the  back  reference  matches  the  character
+       string  corresponding  to  the previous iteration. In order for this to
+       work, the pattern must be such that the first iteration does  not  need
+       to  match the back reference. This can be done using alternation, as in
        the example above, or by a quantifier with a minimum of zero.


-       Back  references of this type cause the group that they reference to be
-       treated as an atomic group.  Once the whole group has been  matched,  a
-       subsequent  matching  failure cannot cause backtracking into the middle
+       Back references of this type cause the group that they reference to  be
+       treated  as  an atomic group.  Once the whole group has been matched, a
+       subsequent matching failure cannot cause backtracking into  the  middle
        of the group.



ASSERTIONS

-       An assertion is a test on the characters  following  or  preceding  the
-       current  matching  point that does not actually consume any characters.
-       The simple assertions coded as \b, \B, \A, \G, \Z,  \z,  ^  and  $  are
+       An  assertion  is  a  test on the characters following or preceding the
+       current matching point that does not actually consume  any  characters.
+       The  simple  assertions  coded  as  \b, \B, \A, \G, \Z, \z, ^ and $ are
        described above.


-       More  complicated  assertions  are  coded as subpatterns. There are two
-       kinds: those that look ahead of the current  position  in  the  subject
-       string,  and  those  that  look  behind  it. An assertion subpattern is
-       matched in the normal way, except that it does not  cause  the  current
+       More complicated assertions are coded as  subpatterns.  There  are  two
+       kinds:  those  that  look  ahead of the current position in the subject
+       string, and those that look  behind  it.  An  assertion  subpattern  is
+       matched  in  the  normal way, except that it does not cause the current
        matching position to be changed.


-       Assertion  subpatterns are not capturing subpatterns. If such an asser-
-       tion contains capturing subpatterns within it, these  are  counted  for
-       the  purposes  of numbering the capturing subpatterns in the whole pat-
-       tern. However, substring capturing is carried  out  only  for  positive
+       Assertion subpatterns are not capturing subpatterns. If such an  asser-
+       tion  contains  capturing  subpatterns within it, these are counted for
+       the purposes of numbering the capturing subpatterns in the  whole  pat-
+       tern.  However,  substring  capturing  is carried out only for positive
        assertions, because it does not make sense for negative assertions.


-       For  compatibility  with  Perl,  assertion subpatterns may be repeated;
-       though it makes no sense to assert the same thing  several  times,  the
-       side  effect  of  capturing  parentheses may occasionally be useful. In
+       For compatibility with Perl, assertion  subpatterns  may  be  repeated;
+       though  it  makes  no sense to assert the same thing several times, the
+       side effect of capturing parentheses may  occasionally  be  useful.  In
        practice, there only three cases:


-       (1) If the quantifier is {0}, the  assertion  is  never  obeyed  during
-       matching.   However,  it  may  contain internal capturing parenthesized
+       (1)  If  the  quantifier  is  {0}, the assertion is never obeyed during
+       matching.  However, it may  contain  internal  capturing  parenthesized
        groups that are called from elsewhere via the subroutine mechanism.


-       (2) If quantifier is {0,n} where n is greater than zero, it is  treated
-       as  if  it  were  {0,1}.  At run time, the rest of the pattern match is
+       (2)  If quantifier is {0,n} where n is greater than zero, it is treated
+       as if it were {0,1}. At run time, the rest  of  the  pattern  match  is
        tried with and without the assertion, the order depending on the greed-
        iness of the quantifier.


-       (3)  If  the minimum repetition is greater than zero, the quantifier is
-       ignored.  The assertion is obeyed just  once  when  encountered  during
+       (3) If the minimum repetition is greater than zero, the  quantifier  is
+       ignored.   The  assertion  is  obeyed just once when encountered during
        matching.


    Lookahead assertions
@@ -5550,38 +5553,38 @@


          \w+(?=;)


-       matches a word followed by a semicolon, but does not include the  semi-
+       matches  a word followed by a semicolon, but does not include the semi-
        colon in the match, and


          foo(?!bar)


-       matches  any  occurrence  of  "foo" that is not followed by "bar". Note
+       matches any occurrence of "foo" that is not  followed  by  "bar".  Note
        that the apparently similar pattern


          (?!foo)bar


-       does not find an occurrence of "bar"  that  is  preceded  by  something
-       other  than "foo"; it finds any occurrence of "bar" whatsoever, because
+       does  not  find  an  occurrence  of "bar" that is preceded by something
+       other than "foo"; it finds any occurrence of "bar" whatsoever,  because
        the assertion (?!foo) is always true when the next three characters are
        "bar". A lookbehind assertion is needed to achieve the other effect.


        If you want to force a matching failure at some point in a pattern, the
-       most convenient way to do it is  with  (?!)  because  an  empty  string
-       always  matches, so an assertion that requires there not to be an empty
+       most  convenient  way  to  do  it  is with (?!) because an empty string
+       always matches, so an assertion that requires there not to be an  empty
        string must always fail.  The backtracking control verb (*FAIL) or (*F)
        is a synonym for (?!).


    Lookbehind assertions


-       Lookbehind  assertions start with (?<= for positive assertions and (?<!
+       Lookbehind assertions start with (?<= for positive assertions and  (?<!
        for negative assertions. For example,


          (?<!foo)bar


-       does find an occurrence of "bar" that is not  preceded  by  "foo".  The
-       contents  of  a  lookbehind  assertion are restricted such that all the
+       does  find  an  occurrence  of "bar" that is not preceded by "foo". The
+       contents of a lookbehind assertion are restricted  such  that  all  the
        strings it matches must have a fixed length. However, if there are sev-
-       eral  top-level  alternatives,  they  do  not all have to have the same
+       eral top-level alternatives, they do not all  have  to  have  the  same
        fixed length. Thus


          (?<=bullock|donkey)
@@ -5590,62 +5593,62 @@


          (?<!dogs?|cats?)


-       causes an error at compile time. Branches that match  different  length
-       strings  are permitted only at the top level of a lookbehind assertion.
+       causes  an  error at compile time. Branches that match different length
+       strings are permitted only at the top level of a lookbehind  assertion.
        This is an extension compared with Perl, which requires all branches to
        match the same length of string. An assertion such as


          (?<=ab(c|de))


-       is  not  permitted,  because  its single top-level branch can match two
+       is not permitted, because its single top-level  branch  can  match  two
        different lengths, but it is acceptable to PCRE if rewritten to use two
        top-level branches:


          (?<=abc|abde)


-       In  some  cases, the escape sequence \K (see above) can be used instead
+       In some cases, the escape sequence \K (see above) can be  used  instead
        of a lookbehind assertion to get round the fixed-length restriction.


-       The implementation of lookbehind assertions is, for  each  alternative,
-       to  temporarily  move the current position back by the fixed length and
+       The  implementation  of lookbehind assertions is, for each alternative,
+       to temporarily move the current position back by the fixed  length  and
        then try to match. If there are insufficient characters before the cur-
        rent position, the assertion fails.


-       In  a UTF mode, PCRE does not allow the \C escape (which matches a sin-
-       gle data unit even in a UTF mode) to appear in  lookbehind  assertions,
-       because  it  makes it impossible to calculate the length of the lookbe-
-       hind. The \X and \R escapes, which can match different numbers of  data
+       In a UTF mode, PCRE does not allow the \C escape (which matches a  sin-
+       gle  data  unit even in a UTF mode) to appear in lookbehind assertions,
+       because it makes it impossible to calculate the length of  the  lookbe-
+       hind.  The \X and \R escapes, which can match different numbers of data
        units, are also not permitted.


-       "Subroutine"  calls  (see below) such as (?2) or (?&X) are permitted in
-       lookbehinds, as long as the subpattern matches a  fixed-length  string.
+       "Subroutine" calls (see below) such as (?2) or (?&X) are  permitted  in
+       lookbehinds,  as  long as the subpattern matches a fixed-length string.
        Recursion, however, is not supported.


-       Possessive  quantifiers  can  be  used  in  conjunction with lookbehind
+       Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
        assertions to specify efficient matching of fixed-length strings at the
        end of subject strings. Consider a simple pattern such as


          abcd$


-       when  applied  to  a  long string that does not match. Because matching
+       when applied to a long string that does  not  match.  Because  matching
        proceeds from left to right, PCRE will look for each "a" in the subject
-       and  then  see  if what follows matches the rest of the pattern. If the
+       and then see if what follows matches the rest of the  pattern.  If  the
        pattern is specified as


          ^.*abcd$


-       the initial .* matches the entire string at first, but when this  fails
+       the  initial .* matches the entire string at first, but when this fails
        (because there is no following "a"), it backtracks to match all but the
-       last character, then all but the last two characters, and so  on.  Once
-       again  the search for "a" covers the entire string, from right to left,
+       last  character,  then all but the last two characters, and so on. Once
+       again the search for "a" covers the entire string, from right to  left,
        so we are no better off. However, if the pattern is written as


          ^.*+(?<=abcd)


-       there can be no backtracking for the .*+ item; it can  match  only  the
-       entire  string.  The subsequent lookbehind assertion does a single test
-       on the last four characters. If it fails, the match fails  immediately.
-       For  long  strings, this approach makes a significant difference to the
+       there  can  be  no backtracking for the .*+ item; it can match only the
+       entire string. The subsequent lookbehind assertion does a  single  test
+       on  the last four characters. If it fails, the match fails immediately.
+       For long strings, this approach makes a significant difference  to  the
        processing time.


    Using multiple assertions
@@ -5654,18 +5657,18 @@


          (?<=\d{3})(?<!999)foo


-       matches "foo" preceded by three digits that are not "999". Notice  that
-       each  of  the  assertions is applied independently at the same point in
-       the subject string. First there is a  check  that  the  previous  three
-       characters  are  all  digits,  and  then there is a check that the same
+       matches  "foo" preceded by three digits that are not "999". Notice that
+       each of the assertions is applied independently at the  same  point  in
+       the  subject  string.  First  there  is a check that the previous three
+       characters are all digits, and then there is  a  check  that  the  same
        three characters are not "999".  This pattern does not match "foo" pre-
-       ceded  by  six  characters,  the first of which are digits and the last
-       three of which are not "999". For example, it  doesn't  match  "123abc-
+       ceded by six characters, the first of which are  digits  and  the  last
+       three  of  which  are not "999". For example, it doesn't match "123abc-
        foo". A pattern to do that is


          (?<=\d{3}...)(?<!999)foo


-       This  time  the  first assertion looks at the preceding six characters,
+       This time the first assertion looks at the  preceding  six  characters,
        checking that the first three are digits, and then the second assertion
        checks that the preceding three characters are not "999".


@@ -5673,29 +5676,29 @@

          (?<=(?<!foo)bar)baz


-       matches  an occurrence of "baz" that is preceded by "bar" which in turn
+       matches an occurrence of "baz" that is preceded by "bar" which in  turn
        is not preceded by "foo", while


          (?<=\d{3}(?!999)...)foo


-       is another pattern that matches "foo" preceded by three digits and  any
+       is  another pattern that matches "foo" preceded by three digits and any
        three characters that are not "999".



CONDITIONAL SUBPATTERNS

-       It  is possible to cause the matching process to obey a subpattern con-
-       ditionally or to choose between two alternative subpatterns,  depending
-       on  the result of an assertion, or whether a specific capturing subpat-
-       tern has already been matched. The two possible  forms  of  conditional
+       It is possible to cause the matching process to obey a subpattern  con-
+       ditionally  or to choose between two alternative subpatterns, depending
+       on the result of an assertion, or whether a specific capturing  subpat-
+       tern  has  already  been matched. The two possible forms of conditional
        subpattern are:


          (?(condition)yes-pattern)
          (?(condition)yes-pattern|no-pattern)


-       If  the  condition is satisfied, the yes-pattern is used; otherwise the
-       no-pattern (if present) is used. If there are more  than  two  alterna-
-       tives  in  the subpattern, a compile-time error occurs. Each of the two
+       If the condition is satisfied, the yes-pattern is used;  otherwise  the
+       no-pattern  (if  present)  is used. If there are more than two alterna-
+       tives in the subpattern, a compile-time error occurs. Each of  the  two
        alternatives may itself contain nested subpatterns of any form, includ-
        ing  conditional  subpatterns;  the  restriction  to  two  alternatives
        applies only at the level of the condition. This pattern fragment is an
@@ -5704,73 +5707,73 @@
          (?(1) (A|B|C) | (D | (?(2)E|F) | E) )



-       There  are  four  kinds of condition: references to subpatterns, refer-
+       There are four kinds of condition: references  to  subpatterns,  refer-
        ences to recursion, a pseudo-condition called DEFINE, and assertions.


    Checking for a used subpattern by number


-       If the text between the parentheses consists of a sequence  of  digits,
+       If  the  text between the parentheses consists of a sequence of digits,
        the condition is true if a capturing subpattern of that number has pre-
-       viously matched. If there is more than one  capturing  subpattern  with
-       the  same  number  (see  the earlier section about duplicate subpattern
-       numbers), the condition is true if any of them have matched. An  alter-
-       native  notation is to precede the digits with a plus or minus sign. In
-       this case, the subpattern number is relative rather than absolute.  The
-       most  recently opened parentheses can be referenced by (?(-1), the next
-       most recent by (?(-2), and so on. Inside loops it can also  make  sense
+       viously  matched.  If  there is more than one capturing subpattern with
+       the same number (see the earlier  section  about  duplicate  subpattern
+       numbers),  the condition is true if any of them have matched. An alter-
+       native notation is to precede the digits with a plus or minus sign.  In
+       this  case, the subpattern number is relative rather than absolute. The
+       most recently opened parentheses can be referenced by (?(-1), the  next
+       most  recent  by (?(-2), and so on. Inside loops it can also make sense
        to refer to subsequent groups. The next parentheses to be opened can be
-       referenced as (?(+1), and so on. (The value zero in any of these  forms
+       referenced  as (?(+1), and so on. (The value zero in any of these forms
        is not used; it provokes a compile-time error.)


-       Consider  the  following  pattern, which contains non-significant white
+       Consider the following pattern, which  contains  non-significant  white
        space to make it more readable (assume the PCRE_EXTENDED option) and to
        divide it into three parts for ease of discussion:


          ( \( )?    [^()]+    (?(1) \) )


-       The  first  part  matches  an optional opening parenthesis, and if that
+       The first part matches an optional opening  parenthesis,  and  if  that
        character is present, sets it as the first captured substring. The sec-
-       ond  part  matches one or more characters that are not parentheses. The
-       third part is a conditional subpattern that tests whether  or  not  the
-       first  set  of  parentheses  matched.  If they did, that is, if subject
-       started with an opening parenthesis, the condition is true, and so  the
-       yes-pattern  is  executed and a closing parenthesis is required. Other-
-       wise, since no-pattern is not present, the subpattern matches  nothing.
-       In  other  words,  this  pattern matches a sequence of non-parentheses,
+       ond part matches one or more characters that are not  parentheses.  The
+       third  part  is  a conditional subpattern that tests whether or not the
+       first set of parentheses matched. If they  did,  that  is,  if  subject
+       started  with an opening parenthesis, the condition is true, and so the
+       yes-pattern is executed and a closing parenthesis is  required.  Other-
+       wise,  since no-pattern is not present, the subpattern matches nothing.
+       In other words, this pattern matches  a  sequence  of  non-parentheses,
        optionally enclosed in parentheses.


-       If you were embedding this pattern in a larger one,  you  could  use  a
+       If  you  were  embedding  this pattern in a larger one, you could use a
        relative reference:


          ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...


-       This  makes  the  fragment independent of the parentheses in the larger
+       This makes the fragment independent of the parentheses  in  the  larger
        pattern.


    Checking for a used subpattern by name


-       Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a
-       used  subpattern  by  name.  For compatibility with earlier versions of
-       PCRE, which had this facility before Perl, the syntax  (?(name)...)  is
-       also  recognized. However, there is a possible ambiguity with this syn-
-       tax, because subpattern names may  consist  entirely  of  digits.  PCRE
-       looks  first for a named subpattern; if it cannot find one and the name
-       consists entirely of digits, PCRE looks for a subpattern of  that  num-
-       ber,  which must be greater than zero. Using subpattern names that con-
+       Perl  uses  the  syntax  (?(<name>)...) or (?('name')...) to test for a
+       used subpattern by name. For compatibility  with  earlier  versions  of
+       PCRE,  which  had this facility before Perl, the syntax (?(name)...) is
+       also recognized. However, there is a possible ambiguity with this  syn-
+       tax,  because  subpattern  names  may  consist entirely of digits. PCRE
+       looks first for a named subpattern; if it cannot find one and the  name
+       consists  entirely  of digits, PCRE looks for a subpattern of that num-
+       ber, which must be greater than zero. Using subpattern names that  con-
        sist entirely of digits is not recommended.


        Rewriting the above example to use a named subpattern gives this:


          (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )


-       If the name used in a condition of this kind is a duplicate,  the  test
-       is  applied to all subpatterns of the same name, and is true if any one
+       If  the  name used in a condition of this kind is a duplicate, the test
+       is applied to all subpatterns of the same name, and is true if any  one
        of them has matched.


    Checking for pattern recursion


        If the condition is the string (R), and there is no subpattern with the
-       name  R, the condition is true if a recursive call to the whole pattern
+       name R, the condition is true if a recursive call to the whole  pattern
        or any subpattern has been made. If digits or a name preceded by amper-
        sand follow the letter R, for example:


@@ -5778,51 +5781,51 @@

        the condition is true if the most recent recursion is into a subpattern
        whose number or name is given. This condition does not check the entire
-       recursion  stack.  If  the  name  used in a condition of this kind is a
+       recursion stack. If the name used in a condition  of  this  kind  is  a
        duplicate, the test is applied to all subpatterns of the same name, and
        is true if any one of them is the most recent recursion.


-       At  "top  level",  all  these recursion test conditions are false.  The
+       At "top level", all these recursion test  conditions  are  false.   The
        syntax for recursive patterns is described below.


    Defining subpatterns for use by reference only


-       If the condition is the string (DEFINE), and  there  is  no  subpattern
-       with  the  name  DEFINE,  the  condition is always false. In this case,
-       there may be only one alternative  in  the  subpattern.  It  is  always
-       skipped  if  control  reaches  this  point  in the pattern; the idea of
-       DEFINE is that it can be used to define subroutines that can be  refer-
-       enced  from elsewhere. (The use of subroutines is described below.) For
-       example, a pattern to match an IPv4 address  such  as  "192.168.23.245"
+       If  the  condition  is  the string (DEFINE), and there is no subpattern
+       with the name DEFINE, the condition is  always  false.  In  this  case,
+       there  may  be  only  one  alternative  in the subpattern. It is always
+       skipped if control reaches this point  in  the  pattern;  the  idea  of
+       DEFINE  is that it can be used to define subroutines that can be refer-
+       enced from elsewhere. (The use of subroutines is described below.)  For
+       example,  a  pattern  to match an IPv4 address such as "192.168.23.245"
        could be written like this (ignore whitespace and line breaks):


          (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
          \b (?&byte) (\.(?&byte)){3} \b


-       The  first part of the pattern is a DEFINE group inside which a another
-       group named "byte" is defined. This matches an individual component  of
-       an  IPv4  address  (a number less than 256). When matching takes place,
-       this part of the pattern is skipped because DEFINE acts  like  a  false
-       condition.  The  rest of the pattern uses references to the named group
-       to match the four dot-separated components of an IPv4 address,  insist-
+       The first part of the pattern is a DEFINE group inside which a  another
+       group  named "byte" is defined. This matches an individual component of
+       an IPv4 address (a number less than 256). When  matching  takes  place,
+       this  part  of  the pattern is skipped because DEFINE acts like a false
+       condition. The rest of the pattern uses references to the  named  group
+       to  match the four dot-separated components of an IPv4 address, insist-
        ing on a word boundary at each end.


    Assertion conditions


-       If  the  condition  is  not  in any of the above formats, it must be an
-       assertion.  This may be a positive or negative lookahead or  lookbehind
-       assertion.  Consider  this  pattern,  again  containing non-significant
+       If the condition is not in any of the above  formats,  it  must  be  an
+       assertion.   This may be a positive or negative lookahead or lookbehind
+       assertion. Consider  this  pattern,  again  containing  non-significant
        white space, and with the two alternatives on the second line:


          (?(?=[^a-z]*[a-z])
          \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )


-       The condition  is  a  positive  lookahead  assertion  that  matches  an
-       optional  sequence of non-letters followed by a letter. In other words,
-       it tests for the presence of at least one letter in the subject.  If  a
-       letter  is found, the subject is matched against the first alternative;
-       otherwise it is  matched  against  the  second.  This  pattern  matches
-       strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
+       The  condition  is  a  positive  lookahead  assertion  that  matches an
+       optional sequence of non-letters followed by a letter. In other  words,
+       it  tests  for the presence of at least one letter in the subject. If a
+       letter is found, the subject is matched against the first  alternative;
+       otherwise  it  is  matched  against  the  second.  This pattern matches
+       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
        letters and dd are digits.



@@ -5831,41 +5834,41 @@
        There are two ways of including comments in patterns that are processed
        by PCRE. In both cases, the start of the comment must not be in a char-
        acter class, nor in the middle of any other sequence of related charac-
-       ters  such  as  (?: or a subpattern name or number. The characters that
+       ters such as (?: or a subpattern name or number.  The  characters  that
        make up a comment play no part in the pattern matching.


-       The sequence (?# marks the start of a comment that continues up to  the
-       next  closing parenthesis. Nested parentheses are not permitted. If the
+       The  sequence (?# marks the start of a comment that continues up to the
+       next closing parenthesis. Nested parentheses are not permitted. If  the
        PCRE_EXTENDED option is set, an unescaped # character also introduces a
-       comment,  which  in  this  case continues to immediately after the next
-       newline character or character sequence in the pattern.  Which  charac-
+       comment, which in this case continues to  immediately  after  the  next
+       newline  character  or character sequence in the pattern. Which charac-
        ters are interpreted as newlines is controlled by the options passed to
-       a compiling function or by a special sequence at the start of the  pat-
+       a  compiling function or by a special sequence at the start of the pat-
        tern, as described in the section entitled "Newline conventions" above.
        Note that the end of this type of comment is a literal newline sequence
-       in  the pattern; escape sequences that happen to represent a newline do
-       not count. For example, consider this  pattern  when  PCRE_EXTENDED  is
+       in the pattern; escape sequences that happen to represent a newline  do
+       not  count.  For  example,  consider this pattern when PCRE_EXTENDED is
        set, and the default newline convention is in force:


          abc #comment \n still comment


-       On  encountering  the  # character, pcre_compile() skips along, looking
-       for a newline in the pattern. The sequence \n is still literal at  this
-       stage,  so  it does not terminate the comment. Only an actual character
+       On encountering the # character, pcre_compile()  skips  along,  looking
+       for  a newline in the pattern. The sequence \n is still literal at this
+       stage, so it does not terminate the comment. Only an  actual  character
        with the code value 0x0a (the default newline) does so.



RECURSIVE PATTERNS

-       Consider the problem of matching a string in parentheses, allowing  for
-       unlimited  nested  parentheses.  Without the use of recursion, the best
-       that can be done is to use a pattern that  matches  up  to  some  fixed
-       depth  of  nesting.  It  is not possible to handle an arbitrary nesting
+       Consider  the problem of matching a string in parentheses, allowing for
+       unlimited nested parentheses. Without the use of  recursion,  the  best
+       that  can  be  done  is  to use a pattern that matches up to some fixed
+       depth of nesting. It is not possible to  handle  an  arbitrary  nesting
        depth.


        For some time, Perl has provided a facility that allows regular expres-
-       sions  to recurse (amongst other things). It does this by interpolating
-       Perl code in the expression at run time, and the code can refer to  the
+       sions to recurse (amongst other things). It does this by  interpolating
+       Perl  code in the expression at run time, and the code can refer to the
        expression itself. A Perl pattern using code interpolation to solve the
        parentheses problem can be created like this:


@@ -5875,201 +5878,201 @@
        refers recursively to the pattern in which it appears.


        Obviously, PCRE cannot support the interpolation of Perl code. Instead,
-       it supports special syntax for recursion of  the  entire  pattern,  and
-       also  for  individual  subpattern  recursion. After its introduction in
-       PCRE and Python, this kind of  recursion  was  subsequently  introduced
+       it  supports  special  syntax  for recursion of the entire pattern, and
+       also for individual subpattern recursion.  After  its  introduction  in
+       PCRE  and  Python,  this  kind of recursion was subsequently introduced
        into Perl at release 5.10.


-       A  special  item  that consists of (? followed by a number greater than
-       zero and a closing parenthesis is a recursive subroutine  call  of  the
-       subpattern  of  the  given  number, provided that it occurs inside that
-       subpattern. (If not, it is a non-recursive subroutine  call,  which  is
-       described  in  the  next  section.)  The special item (?R) or (?0) is a
+       A special item that consists of (? followed by a  number  greater  than
+       zero  and  a  closing parenthesis is a recursive subroutine call of the
+       subpattern of the given number, provided that  it  occurs  inside  that
+       subpattern.  (If  not,  it is a non-recursive subroutine call, which is
+       described in the next section.) The special item  (?R)  or  (?0)  is  a
        recursive call of the entire regular expression.


-       This PCRE pattern solves the nested  parentheses  problem  (assume  the
+       This  PCRE  pattern  solves  the nested parentheses problem (assume the
        PCRE_EXTENDED option is set so that white space is ignored):


          \( ( [^()]++ | (?R) )* \)


-       First  it matches an opening parenthesis. Then it matches any number of
-       substrings which can either be a  sequence  of  non-parentheses,  or  a
-       recursive  match  of the pattern itself (that is, a correctly parenthe-
+       First it matches an opening parenthesis. Then it matches any number  of
+       substrings  which  can  either  be  a sequence of non-parentheses, or a
+       recursive match of the pattern itself (that is, a  correctly  parenthe-
        sized substring).  Finally there is a closing parenthesis. Note the use
        of a possessive quantifier to avoid backtracking into sequences of non-
        parentheses.


-       If this were part of a larger pattern, you would not  want  to  recurse
+       If  this  were  part of a larger pattern, you would not want to recurse
        the entire pattern, so instead you could use this:


          ( \( ( [^()]++ | (?1) )* \) )


-       We  have  put the pattern into parentheses, and caused the recursion to
+       We have put the pattern into parentheses, and caused the  recursion  to
        refer to them instead of the whole pattern.


-       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
-       tricky.  This is made easier by the use of relative references. Instead
+       In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
+       tricky. This is made easier by the use of relative references.  Instead
        of (?1) in the pattern above you can write (?-2) to refer to the second
-       most  recently  opened  parentheses  preceding  the recursion. In other
-       words, a negative number counts capturing  parentheses  leftwards  from
+       most recently opened parentheses  preceding  the  recursion.  In  other
+       words,  a  negative  number counts capturing parentheses leftwards from
        the point at which it is encountered.


-       It  is  also  possible  to refer to subsequently opened parentheses, by
-       writing references such as (?+2). However, these  cannot  be  recursive
-       because  the  reference  is  not inside the parentheses that are refer-
-       enced. They are always non-recursive subroutine calls, as described  in
+       It is also possible to refer to  subsequently  opened  parentheses,  by
+       writing  references  such  as (?+2). However, these cannot be recursive
+       because the reference is not inside the  parentheses  that  are  refer-
+       enced.  They are always non-recursive subroutine calls, as described in
        the next section.


-       An  alternative  approach is to use named parentheses instead. The Perl
-       syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
+       An alternative approach is to use named parentheses instead.  The  Perl
+       syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also
        supported. We could rewrite the above example as follows:


          (?<pn> \( ( [^()]++ | (?&pn) )* \) )


-       If  there  is more than one subpattern with the same name, the earliest
+       If there is more than one subpattern with the same name,  the  earliest
        one is used.


-       This particular example pattern that we have been looking  at  contains
+       This  particular  example pattern that we have been looking at contains
        nested unlimited repeats, and so the use of a possessive quantifier for
        matching strings of non-parentheses is important when applying the pat-
-       tern  to  strings  that do not match. For example, when this pattern is
+       tern to strings that do not match. For example, when  this  pattern  is
        applied to


          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()


-       it yields "no match" quickly. However, if a  possessive  quantifier  is
-       not  used, the match runs for a very long time indeed because there are
-       so many different ways the + and * repeats can carve  up  the  subject,
+       it  yields  "no  match" quickly. However, if a possessive quantifier is
+       not used, the match runs for a very long time indeed because there  are
+       so  many  different  ways the + and * repeats can carve up the subject,
        and all have to be tested before failure can be reported.


-       At  the  end  of a match, the values of capturing parentheses are those
-       from the outermost level. If you want to obtain intermediate values,  a
-       callout  function can be used (see below and the pcrecallout documenta-
+       At the end of a match, the values of capturing  parentheses  are  those
+       from  the outermost level. If you want to obtain intermediate values, a
+       callout function can be used (see below and the pcrecallout  documenta-
        tion). If the pattern above is matched against


          (ab(cd)ef)


-       the value for the inner capturing parentheses  (numbered  2)  is  "ef",
-       which  is the last value taken on at the top level. If a capturing sub-
-       pattern is not matched at the top level, its final  captured  value  is
-       unset,  even  if  it was (temporarily) set at a deeper level during the
+       the  value  for  the  inner capturing parentheses (numbered 2) is "ef",
+       which is the last value taken on at the top level. If a capturing  sub-
+       pattern  is  not  matched at the top level, its final captured value is
+       unset, even if it was (temporarily) set at a deeper  level  during  the
        matching process.


-       If there are more than 15 capturing parentheses in a pattern, PCRE  has
-       to  obtain extra memory to store data during a recursion, which it does
+       If  there are more than 15 capturing parentheses in a pattern, PCRE has
+       to obtain extra memory to store data during a recursion, which it  does
        by using pcre_malloc, freeing it via pcre_free afterwards. If no memory
        can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.


-       Do  not  confuse  the (?R) item with the condition (R), which tests for
-       recursion.  Consider this pattern, which matches text in  angle  brack-
-       ets,  allowing for arbitrary nesting. Only digits are allowed in nested
-       brackets (that is, when recursing), whereas any characters are  permit-
+       Do not confuse the (?R) item with the condition (R),  which  tests  for
+       recursion.   Consider  this pattern, which matches text in angle brack-
+       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
+       brackets  (that is, when recursing), whereas any characters are permit-
        ted at the outer level.


          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >


-       In  this  pattern, (?(R) is the start of a conditional subpattern, with
-       two different alternatives for the recursive and  non-recursive  cases.
+       In this pattern, (?(R) is the start of a conditional  subpattern,  with
+       two  different  alternatives for the recursive and non-recursive cases.
        The (?R) item is the actual recursive call.


    Differences in recursion processing between PCRE and Perl


-       Recursion  processing  in PCRE differs from Perl in two important ways.
-       In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
+       Recursion processing in PCRE differs from Perl in two  important  ways.
+       In  PCRE (like Python, but unlike Perl), a recursive subpattern call is
        always treated as an atomic group. That is, once it has matched some of
        the subject string, it is never re-entered, even if it contains untried
-       alternatives  and  there  is a subsequent matching failure. This can be
-       illustrated by the following pattern, which purports to match a  palin-
-       dromic  string  that contains an odd number of characters (for example,
+       alternatives and there is a subsequent matching failure.  This  can  be
+       illustrated  by the following pattern, which purports to match a palin-
+       dromic string that contains an odd number of characters  (for  example,
        "a", "aba", "abcba", "abcdcba"):


          ^(.|(.)(?1)\2)$


        The idea is that it either matches a single character, or two identical
-       characters  surrounding  a sub-palindrome. In Perl, this pattern works;
-       in PCRE it does not if the pattern is  longer  than  three  characters.
+       characters surrounding a sub-palindrome. In Perl, this  pattern  works;
+       in  PCRE  it  does  not if the pattern is longer than three characters.
        Consider the subject string "abcba":


-       At  the  top level, the first character is matched, but as it is not at
+       At the top level, the first character is matched, but as it is  not  at
        the end of the string, the first alternative fails; the second alterna-
        tive is taken and the recursion kicks in. The recursive call to subpat-
-       tern 1 successfully matches the next character ("b").  (Note  that  the
+       tern  1  successfully  matches the next character ("b"). (Note that the
        beginning and end of line tests are not part of the recursion).


-       Back  at  the top level, the next character ("c") is compared with what
-       subpattern 2 matched, which was "a". This fails. Because the  recursion
-       is  treated  as  an atomic group, there are now no backtracking points,
-       and so the entire match fails. (Perl is able, at  this  point,  to  re-
-       enter  the  recursion  and try the second alternative.) However, if the
+       Back at the top level, the next character ("c") is compared  with  what
+       subpattern  2 matched, which was "a". This fails. Because the recursion
+       is treated as an atomic group, there are now  no  backtracking  points,
+       and  so  the  entire  match fails. (Perl is able, at this point, to re-
+       enter the recursion and try the second alternative.)  However,  if  the
        pattern is written with the alternatives in the other order, things are
        different:


          ^((.)(?1)\2|.)$


-       This  time,  the recursing alternative is tried first, and continues to
-       recurse until it runs out of characters, at which point  the  recursion
-       fails.  But  this  time  we  do  have another alternative to try at the
-       higher level. That is the big difference:  in  the  previous  case  the
+       This time, the recursing alternative is tried first, and  continues  to
+       recurse  until  it runs out of characters, at which point the recursion
+       fails. But this time we do have  another  alternative  to  try  at  the
+       higher  level.  That  is  the  big difference: in the previous case the
        remaining alternative is at a deeper recursion level, which PCRE cannot
        use.


-       To change the pattern so that it matches all palindromic  strings,  not
-       just  those  with an odd number of characters, it is tempting to change
+       To  change  the pattern so that it matches all palindromic strings, not
+       just those with an odd number of characters, it is tempting  to  change
        the pattern to this:


          ^((.)(?1)\2|.?)$


-       Again, this works in Perl, but not in PCRE, and for  the  same  reason.
-       When  a  deeper  recursion has matched a single character, it cannot be
-       entered again in order to match an empty string.  The  solution  is  to
-       separate  the two cases, and write out the odd and even cases as alter-
+       Again,  this  works  in Perl, but not in PCRE, and for the same reason.
+       When a deeper recursion has matched a single character,  it  cannot  be
+       entered  again  in  order  to match an empty string. The solution is to
+       separate the two cases, and write out the odd and even cases as  alter-
        natives at the higher level:


          ^(?:((.)(?1)\2|)|((.)(?3)\4|.))


-       If you want to match typical palindromic phrases, the  pattern  has  to
+       If  you  want  to match typical palindromic phrases, the pattern has to
        ignore all non-word characters, which can be done like this:


          ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$


        If run with the PCRE_CASELESS option, this pattern matches phrases such
        as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
-       Perl.  Note the use of the possessive quantifier *+ to avoid backtrack-
-       ing into sequences of non-word characters. Without this, PCRE  takes  a
-       great  deal  longer  (ten  times or more) to match typical phrases, and
+       Perl. Note the use of the possessive quantifier *+ to avoid  backtrack-
+       ing  into  sequences of non-word characters. Without this, PCRE takes a
+       great deal longer (ten times or more) to  match  typical  phrases,  and
        Perl takes so long that you think it has gone into a loop.


-       WARNING: The palindrome-matching patterns above work only if  the  sub-
-       ject  string  does not start with a palindrome that is shorter than the
-       entire string.  For example, although "abcba" is correctly matched,  if
-       the  subject  is "ababa", PCRE finds the palindrome "aba" at the start,
-       then fails at top level because the end of the string does not  follow.
-       Once  again, it cannot jump back into the recursion to try other alter-
+       WARNING:  The  palindrome-matching patterns above work only if the sub-
+       ject string does not start with a palindrome that is shorter  than  the
+       entire  string.  For example, although "abcba" is correctly matched, if
+       the subject is "ababa", PCRE finds the palindrome "aba" at  the  start,
+       then  fails at top level because the end of the string does not follow.
+       Once again, it cannot jump back into the recursion to try other  alter-
        natives, so the entire match fails.


-       The second way in which PCRE and Perl differ in  their  recursion  pro-
-       cessing  is in the handling of captured values. In Perl, when a subpat-
-       tern is called recursively or as a subpattern (see the  next  section),
-       it  has  no  access to any values that were captured outside the recur-
-       sion, whereas in PCRE these values can  be  referenced.  Consider  this
+       The  second  way  in which PCRE and Perl differ in their recursion pro-
+       cessing is in the handling of captured values. In Perl, when a  subpat-
+       tern  is  called recursively or as a subpattern (see the next section),
+       it has no access to any values that were captured  outside  the  recur-
+       sion,  whereas  in  PCRE  these values can be referenced. Consider this
        pattern:


          ^(.)(\1|a(?2))


-       In  PCRE,  this  pattern matches "bab". The first capturing parentheses
-       match "b", then in the second group, when the back reference  \1  fails
-       to  match "b", the second alternative matches "a" and then recurses. In
-       the recursion, \1 does now match "b" and so the whole  match  succeeds.
-       In  Perl,  the pattern fails to match because inside the recursive call
+       In PCRE, this pattern matches "bab". The  first  capturing  parentheses
+       match  "b",  then in the second group, when the back reference \1 fails
+       to match "b", the second alternative matches "a" and then recurses.  In
+       the  recursion,  \1 does now match "b" and so the whole match succeeds.
+       In Perl, the pattern fails to match because inside the  recursive  call
        \1 cannot access the externally set value.



SUBPATTERNS AS SUBROUTINES

-       If the syntax for a recursive subpattern call (either by number  or  by
-       name)  is  used outside the parentheses to which it refers, it operates
-       like a subroutine in a programming language. The called subpattern  may
-       be  defined  before or after the reference. A numbered reference can be
+       If  the  syntax for a recursive subpattern call (either by number or by
+       name) is used outside the parentheses to which it refers,  it  operates
+       like  a subroutine in a programming language. The called subpattern may
+       be defined before or after the reference. A numbered reference  can  be
        absolute or relative, as in these examples:


          (...(absolute)...)...(?2)...
@@ -6080,187 +6083,187 @@


          (sens|respons)e and \1ibility


-       matches "sense and sensibility" and "response and responsibility",  but
+       matches  "sense and sensibility" and "response and responsibility", but
        not "sense and responsibility". If instead the pattern


          (sens|respons)e and (?1)ibility


-       is  used, it does match "sense and responsibility" as well as the other
-       two strings. Another example is  given  in  the  discussion  of  DEFINE
+       is used, it does match "sense and responsibility" as well as the  other
+       two  strings.  Another  example  is  given  in the discussion of DEFINE
        above.


-       All  subroutine  calls, whether recursive or not, are always treated as
-       atomic groups. That is, once a subroutine has matched some of the  sub-
+       All subroutine calls, whether recursive or not, are always  treated  as
+       atomic  groups. That is, once a subroutine has matched some of the sub-
        ject string, it is never re-entered, even if it contains untried alter-
-       natives and there is  a  subsequent  matching  failure.  Any  capturing
-       parentheses  that  are  set  during the subroutine call revert to their
+       natives  and  there  is  a  subsequent  matching failure. Any capturing
+       parentheses that are set during the subroutine  call  revert  to  their
        previous values afterwards.


-       Processing options such as case-independence are fixed when  a  subpat-
-       tern  is defined, so if it is used as a subroutine, such options cannot
+       Processing  options  such as case-independence are fixed when a subpat-
+       tern is defined, so if it is used as a subroutine, such options  cannot
        be changed for different calls. For example, consider this pattern:


          (abc)(?i:(?-1))


-       It matches "abcabc". It does not match "abcABC" because the  change  of
+       It  matches  "abcabc". It does not match "abcABC" because the change of
        processing option does not affect the called subpattern.



ONIGURUMA SUBROUTINE SYNTAX

-       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
+       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
        name or a number enclosed either in angle brackets or single quotes, is
-       an  alternative  syntax  for  referencing a subpattern as a subroutine,
-       possibly recursively. Here are two of the examples used above,  rewrit-
+       an alternative syntax for referencing a  subpattern  as  a  subroutine,
+       possibly  recursively. Here are two of the examples used above, rewrit-
        ten using this syntax:


          (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
          (sens|respons)e and \g'1'ibility


-       PCRE  supports  an extension to Oniguruma: if a number is preceded by a
+       PCRE supports an extension to Oniguruma: if a number is preceded  by  a
        plus or a minus sign it is taken as a relative reference. For example:


          (abc)(?i:\g<-1>)


-       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
-       synonymous.  The former is a back reference; the latter is a subroutine
+       Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
+       synonymous. The former is a back reference; the latter is a  subroutine
        call.



CALLOUTS

        Perl has a feature whereby using the sequence (?{...}) causes arbitrary
-       Perl  code to be obeyed in the middle of matching a regular expression.
+       Perl code to be obeyed in the middle of matching a regular  expression.
        This makes it possible, amongst other things, to extract different sub-
        strings that match the same pair of parentheses when there is a repeti-
        tion.


        PCRE provides a similar feature, but of course it cannot obey arbitrary
        Perl code. The feature is called "callout". The caller of PCRE provides
-       an external function by putting its entry point in the global  variable
-       pcre_callout  (8-bit  library)  or  pcre16_callout (16-bit library). By
+       an  external function by putting its entry point in the global variable
+       pcre_callout (8-bit library) or  pcre16_callout  (16-bit  library).  By
        default, this variable contains NULL, which disables all calling out.


-       Within a regular expression, (?C) indicates the  points  at  which  the
-       external  function  is  to be called. If you want to identify different
-       callout points, you can put a number less than 256 after the letter  C.
-       The  default  value is zero.  For example, this pattern has two callout
+       Within  a  regular  expression,  (?C) indicates the points at which the
+       external function is to be called. If you want  to  identify  different
+       callout  points, you can put a number less than 256 after the letter C.
+       The default value is zero.  For example, this pattern has  two  callout
        points:


          (?C1)abc(?C2)def


-       If the PCRE_AUTO_CALLOUT flag is passed to a compiling function,  call-
-       outs  are automatically installed before each item in the pattern. They
+       If  the PCRE_AUTO_CALLOUT flag is passed to a compiling function, call-
+       outs are automatically installed before each item in the pattern.  They
        are all numbered 255.


-       During matching, when PCRE reaches a callout point, the external  func-
-       tion  is  called.  It  is  provided with the number of the callout, the
-       position in the pattern, and, optionally, one item of  data  originally
-       supplied  by  the caller of the matching function. The callout function
-       may cause matching to proceed, to backtrack, or to fail  altogether.  A
-       complete  description of the interface to the callout function is given
+       During  matching, when PCRE reaches a callout point, the external func-
+       tion is called. It is provided with the  number  of  the  callout,  the
+       position  in  the pattern, and, optionally, one item of data originally
+       supplied by the caller of the matching function. The  callout  function
+       may  cause  matching to proceed, to backtrack, or to fail altogether. A
+       complete description of the interface to the callout function is  given
        in the pcrecallout documentation.



BACKTRACKING CONTROL

-       Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
+       Perl  5.10 introduced a number of "Special Backtracking Control Verbs",
        which are described in the Perl documentation as "experimental and sub-
-       ject to change or removal in a future version of Perl". It goes  on  to
-       say:  "Their usage in production code should be noted to avoid problems
+       ject  to  change or removal in a future version of Perl". It goes on to
+       say: "Their usage in production code should be noted to avoid  problems
        during upgrades." The same remarks apply to the PCRE features described
        in this section.


-       Since  these  verbs  are  specifically related to backtracking, most of
-       them can be used only when the pattern is to be matched  using  one  of
+       Since these verbs are specifically related  to  backtracking,  most  of
+       them  can  be  used only when the pattern is to be matched using one of
        the traditional matching functions, which use a backtracking algorithm.
-       With the exception of (*FAIL), which behaves like  a  failing  negative
-       assertion,  they  cause an error if encountered by a DFA matching func-
+       With  the  exception  of (*FAIL), which behaves like a failing negative
+       assertion, they cause an error if encountered by a DFA  matching  func-
        tion.


-       If any of these verbs are used in an assertion or in a subpattern  that
+       If  any of these verbs are used in an assertion or in a subpattern that
        is called as a subroutine (whether or not recursively), their effect is
        confined to that subpattern; it does not extend to the surrounding pat-
        tern, with one exception: the name from a *(MARK), (*PRUNE), or (*THEN)
-       that is encountered in a successful positive assertion is  passed  back
-       when  a  match  succeeds (compare capturing parentheses in assertions).
+       that  is  encountered in a successful positive assertion is passed back
+       when a match succeeds (compare capturing  parentheses  in  assertions).
        Note that such subpatterns are processed as anchored at the point where
        they are tested. Note also that Perl's treatment of subroutines is dif-
        ferent in some cases.


-       The new verbs make use of what was previously invalid syntax: an  open-
+       The  new verbs make use of what was previously invalid syntax: an open-
        ing parenthesis followed by an asterisk. They are generally of the form
-       (*VERB) or (*VERB:NAME). Some may take either form, with differing  be-
-       haviour,  depending on whether or not an argument is present. A name is
+       (*VERB)  or (*VERB:NAME). Some may take either form, with differing be-
+       haviour, depending on whether or not an argument is present. A name  is
        any sequence of characters that does not include a closing parenthesis.
-       If  the  name is empty, that is, if the closing parenthesis immediately
-       follows the colon, the effect is as if the colon were  not  there.  Any
+       If the name is empty, that is, if the closing  parenthesis  immediately
+       follows  the  colon,  the effect is as if the colon were not there. Any
        number of these verbs may occur in a pattern.


    Optimizations that affect backtracking verbs


-       PCRE  contains some optimizations that are used to speed up matching by
+       PCRE contains some optimizations that are used to speed up matching  by
        running some checks at the start of each match attempt. For example, it
-       may  know  the minimum length of matching subject, or that a particular
-       character must be present. When one of these  optimizations  suppresses
-       the  running  of  a match, any included backtracking verbs will not, of
+       may know the minimum length of matching subject, or that  a  particular
+       character  must  be present. When one of these optimizations suppresses
+       the running of a match, any included backtracking verbs  will  not,  of
        course, be processed. You can suppress the start-of-match optimizations
-       by  setting  the  PCRE_NO_START_OPTIMIZE  option when calling pcre_com-
+       by setting the PCRE_NO_START_OPTIMIZE  option  when  calling  pcre_com-
        pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).
        There is more discussion of this option in the section entitled "Option
        bits for pcre_exec()" in the pcreapi documentation.


-       Experiments with Perl suggest that it too  has  similar  optimizations,
+       Experiments  with  Perl  suggest that it too has similar optimizations,
        sometimes leading to anomalous results.


    Verbs that act immediately


-       The  following  verbs act as soon as they are encountered. They may not
+       The following verbs act as soon as they are encountered. They  may  not
        be followed by a name.


           (*ACCEPT)


-       This verb causes the match to end successfully, skipping the  remainder
-       of  the pattern. However, when it is inside a subpattern that is called
-       as a subroutine, only that subpattern is ended  successfully.  Matching
-       then  continues  at  the  outer level. If (*ACCEPT) is inside capturing
+       This  verb causes the match to end successfully, skipping the remainder
+       of the pattern. However, when it is inside a subpattern that is  called
+       as  a  subroutine, only that subpattern is ended successfully. Matching
+       then continues at the outer level. If  (*ACCEPT)  is  inside  capturing
        parentheses, the data so far is captured. For example:


          A((?:A|B(*ACCEPT)|C)D)


-       This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
+       This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
        tured by the outer parentheses.


          (*FAIL) or (*F)


-       This  verb causes a matching failure, forcing backtracking to occur. It
-       is equivalent to (?!) but easier to read. The Perl documentation  notes
-       that  it  is  probably  useful only when combined with (?{}) or (??{}).
-       Those are, of course, Perl features that are not present in  PCRE.  The
-       nearest  equivalent is the callout feature, as for example in this pat-
+       This verb causes a matching failure, forcing backtracking to occur.  It
+       is  equivalent to (?!) but easier to read. The Perl documentation notes
+       that it is probably useful only when combined  with  (?{})  or  (??{}).
+       Those  are,  of course, Perl features that are not present in PCRE. The
+       nearest equivalent is the callout feature, as for example in this  pat-
        tern:


          a+(?C)(*FAIL)


-       A match with the string "aaaa" always fails, but the callout  is  taken
+       A  match  with the string "aaaa" always fails, but the callout is taken
        before each backtrack happens (in this example, 10 times).


    Recording which path was taken


-       There  is  one  verb  whose  main  purpose  is to track how a match was
-       arrived at, though it also has a  secondary  use  in  conjunction  with
+       There is one verb whose main purpose  is  to  track  how  a  match  was
+       arrived  at,  though  it  also  has a secondary use in conjunction with
        advancing the match starting point (see (*SKIP) below).


          (*MARK:NAME) or (*:NAME)


-       A  name  is  always  required  with  this  verb.  There  may be as many
-       instances of (*MARK) as you like in a pattern, and their names  do  not
+       A name is always  required  with  this  verb.  There  may  be  as  many
+       instances  of  (*MARK) as you like in a pattern, and their names do not
        have to be unique.


-       When  a match succeeds, the name of the last-encountered (*MARK) on the
-       matching path is passed back to the caller as described in the  section
-       entitled  "Extra  data  for  pcre_exec()" in the pcreapi documentation.
-       Here is an example of pcretest output, where the /K  modifier  requests
+       When a match succeeds, the name of the last-encountered (*MARK) on  the
+       matching  path is passed back to the caller as described in the section
+       entitled "Extra data for pcre_exec()"  in  the  pcreapi  documentation.
+       Here  is  an example of pcretest output, where the /K modifier requests
        the retrieval and outputting of (*MARK) data:


            re> /X(*MARK:A)Y|X(*MARK:B)Z/K
@@ -6272,63 +6275,63 @@
          MK: B


        The (*MARK) name is tagged with "MK:" in this output, and in this exam-
-       ple it indicates which of the two alternatives matched. This is a  more
-       efficient  way of obtaining this information than putting each alterna-
+       ple  it indicates which of the two alternatives matched. This is a more
+       efficient way of obtaining this information than putting each  alterna-
        tive in its own capturing parentheses.


        If (*MARK) is encountered in a positive assertion, its name is recorded
        and passed back if it is the last-encountered. This does not happen for
        negative assertions.


-       After a partial match or a failed match, the name of the  last  encoun-
+       After  a  partial match or a failed match, the name of the last encoun-
        tered (*MARK) in the entire match process is returned. For example:


            re> /X(*MARK:A)Y|X(*MARK:B)Z/K
          data> XP
          No match, mark = B


-       Note  that  in  this  unanchored  example the mark is retained from the
+       Note that in this unanchored example the  mark  is  retained  from  the
        match attempt that started at the letter "X" in the subject. Subsequent
        match attempts starting at "P" and then with an empty string do not get
        as far as the (*MARK) item, but nevertheless do not reset it.


-       If you are interested in  (*MARK)  values  after  failed  matches,  you
-       should  probably  set  the PCRE_NO_START_OPTIMIZE option (see above) to
+       If  you  are  interested  in  (*MARK)  values after failed matches, you
+       should probably set the PCRE_NO_START_OPTIMIZE option  (see  above)  to
        ensure that the match is always attempted.


    Verbs that act after backtracking


        The following verbs do nothing when they are encountered. Matching con-
-       tinues  with what follows, but if there is no subsequent match, causing
-       a backtrack to the verb, a failure is  forced.  That  is,  backtracking
-       cannot  pass  to the left of the verb. However, when one of these verbs
-       appears inside an atomic group, its effect is confined to  that  group,
-       because  once the group has been matched, there is never any backtrack-
-       ing into it. In this situation, backtracking can  "jump  back"  to  the
-       left  of the entire atomic group. (Remember also, as stated above, that
+       tinues with what follows, but if there is no subsequent match,  causing
+       a  backtrack  to  the  verb, a failure is forced. That is, backtracking
+       cannot pass to the left of the verb. However, when one of  these  verbs
+       appears  inside  an atomic group, its effect is confined to that group,
+       because once the group has been matched, there is never any  backtrack-
+       ing  into  it.  In  this situation, backtracking can "jump back" to the
+       left of the entire atomic group. (Remember also, as stated above,  that
        this localization also applies in subroutine calls and assertions.)


-       These verbs differ in exactly what kind of failure  occurs  when  back-
+       These  verbs  differ  in exactly what kind of failure occurs when back-
        tracking reaches them.


          (*COMMIT)


-       This  verb, which may not be followed by a name, causes the whole match
+       This verb, which may not be followed by a name, causes the whole  match
        to fail outright if the rest of the pattern does not match. Even if the
        pattern is unanchored, no further attempts to find a match by advancing
        the  starting  point  take  place.  Once  (*COMMIT)  has  been  passed,
-       pcre_exec()  is  committed  to  finding a match at the current starting
+       pcre_exec() is committed to finding a match  at  the  current  starting
        point, or not at all. For example:


          a+(*COMMIT)b


-       This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
+       This  matches  "xxaab" but not "aacaab". It can be thought of as a kind
        of dynamic anchor, or "I've started, so I must finish." The name of the
-       most recently passed (*MARK) in the path is passed back when  (*COMMIT)
+       most  recently passed (*MARK) in the path is passed back when (*COMMIT)
        forces a match failure.


-       Note  that  (*COMMIT)  at  the start of a pattern is not the same as an
-       anchor, unless PCRE's start-of-match optimizations are turned  off,  as
+       Note that (*COMMIT) at the start of a pattern is not  the  same  as  an
+       anchor,  unless  PCRE's start-of-match optimizations are turned off, as
        shown in this pcretest example:


            re> /(*COMMIT)abc/
@@ -6337,111 +6340,111 @@
          xyzabc\Y
          No match


-       PCRE  knows  that  any  match  must start with "a", so the optimization
-       skips along the subject to "a" before running the first match  attempt,
-       which  succeeds.  When the optimization is disabled by the \Y escape in
+       PCRE knows that any match must start  with  "a",  so  the  optimization
+       skips  along the subject to "a" before running the first match attempt,
+       which succeeds. When the optimization is disabled by the \Y  escape  in
        the second subject, the match starts at "x" and so the (*COMMIT) causes
        it to fail without trying any other starting points.


          (*PRUNE) or (*PRUNE:NAME)


-       This  verb causes the match to fail at the current starting position in
-       the subject if the rest of the pattern does not match. If  the  pattern
-       is  unanchored,  the  normal  "bumpalong"  advance to the next starting
-       character then happens. Backtracking can occur as usual to the left  of
-       (*PRUNE),  before  it  is  reached,  or  when  matching to the right of
-       (*PRUNE), but if there is no match to the  right,  backtracking  cannot
-       cross  (*PRUNE). In simple cases, the use of (*PRUNE) is just an alter-
-       native to an atomic group or possessive quantifier, but there are  some
+       This verb causes the match to fail at the current starting position  in
+       the  subject  if the rest of the pattern does not match. If the pattern
+       is unanchored, the normal "bumpalong"  advance  to  the  next  starting
+       character  then happens. Backtracking can occur as usual to the left of
+       (*PRUNE), before it is reached,  or  when  matching  to  the  right  of
+       (*PRUNE),  but  if  there is no match to the right, backtracking cannot
+       cross (*PRUNE). In simple cases, the use of (*PRUNE) is just an  alter-
+       native  to an atomic group or possessive quantifier, but there are some
        uses of (*PRUNE) that cannot be expressed in any other way.  The behav-
-       iour of (*PRUNE:NAME)  is  the  same  as  (*MARK:NAME)(*PRUNE).  In  an
+       iour  of  (*PRUNE:NAME)  is  the  same  as  (*MARK:NAME)(*PRUNE). In an
        anchored pattern (*PRUNE) has the same effect as (*COMMIT).


          (*SKIP)


-       This  verb, when given without a name, is like (*PRUNE), except that if
-       the pattern is unanchored, the "bumpalong" advance is not to  the  next
+       This verb, when given without a name, is like (*PRUNE), except that  if
+       the  pattern  is unanchored, the "bumpalong" advance is not to the next
        character, but to the position in the subject where (*SKIP) was encoun-
-       tered. (*SKIP) signifies that whatever text was matched leading  up  to
+       tered.  (*SKIP)  signifies that whatever text was matched leading up to
        it cannot be part of a successful match. Consider:


          a+(*SKIP)b


-       If  the  subject  is  "aaaac...",  after  the first match attempt fails
-       (starting at the first character in the  string),  the  starting  point
+       If the subject is "aaaac...",  after  the  first  match  attempt  fails
+       (starting  at  the  first  character in the string), the starting point
        skips on to start the next attempt at "c". Note that a possessive quan-
-       tifer does not have the same effect as this example; although it  would
-       suppress  backtracking  during  the  first  match  attempt,  the second
-       attempt would start at the second character instead of skipping  on  to
+       tifer  does not have the same effect as this example; although it would
+       suppress backtracking  during  the  first  match  attempt,  the  second
+       attempt  would  start at the second character instead of skipping on to
        "c".


          (*SKIP:NAME)


-       When  (*SKIP) has an associated name, its behaviour is modified. If the
+       When (*SKIP) has an associated name, its behaviour is modified. If  the
        following pattern fails to match, the previous path through the pattern
-       is  searched for the most recent (*MARK) that has the same name. If one
-       is found, the "bumpalong" advance is to the subject position that  cor-
-       responds  to  that (*MARK) instead of to where (*SKIP) was encountered.
+       is searched for the most recent (*MARK) that has the same name. If  one
+       is  found, the "bumpalong" advance is to the subject position that cor-
+       responds to that (*MARK) instead of to where (*SKIP)  was  encountered.
        If no (*MARK) with a matching name is found, the (*SKIP) is ignored.


          (*THEN) or (*THEN:NAME)


-       This verb causes a skip to the next innermost alternative if  the  rest
-       of  the  pattern does not match. That is, it cancels pending backtrack-
-       ing, but only within the current alternative. Its name comes  from  the
+       This  verb  causes a skip to the next innermost alternative if the rest
+       of the pattern does not match. That is, it cancels  pending  backtrack-
+       ing,  but  only within the current alternative. Its name comes from the
        observation that it can be used for a pattern-based if-then-else block:


          ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...


-       If  the COND1 pattern matches, FOO is tried (and possibly further items
-       after the end of the group if FOO succeeds); on  failure,  the  matcher
-       skips  to  the second alternative and tries COND2, without backtracking
-       into COND1. The behaviour  of  (*THEN:NAME)  is  exactly  the  same  as
-       (*MARK:NAME)(*THEN).   If (*THEN) is not inside an alternation, it acts
+       If the COND1 pattern matches, FOO is tried (and possibly further  items
+       after  the  end  of the group if FOO succeeds); on failure, the matcher
+       skips to the second alternative and tries COND2,  without  backtracking
+       into  COND1.  The  behaviour  of  (*THEN:NAME)  is  exactly the same as
+       (*MARK:NAME)(*THEN).  If (*THEN) is not inside an alternation, it  acts
        like (*PRUNE).


-       Note that a subpattern that does not contain a | character  is  just  a
-       part  of the enclosing alternative; it is not a nested alternation with
-       only one alternative. The effect of (*THEN) extends beyond such a  sub-
-       pattern  to  the enclosing alternative. Consider this pattern, where A,
+       Note  that  a  subpattern that does not contain a | character is just a
+       part of the enclosing alternative; it is not a nested alternation  with
+       only  one alternative. The effect of (*THEN) extends beyond such a sub-
+       pattern to the enclosing alternative. Consider this pattern,  where  A,
        B, etc. are complex pattern fragments that do not contain any | charac-
        ters at this level:


          A (B(*THEN)C) | D


-       If  A and B are matched, but there is a failure in C, matching does not
+       If A and B are matched, but there is a failure in C, matching does  not
        backtrack into A; instead it moves to the next alternative, that is, D.
-       However,  if the subpattern containing (*THEN) is given an alternative,
+       However, if the subpattern containing (*THEN) is given an  alternative,
        it behaves differently:


          A (B(*THEN)C | (*FAIL)) | D


-       The effect of (*THEN) is now confined to the inner subpattern. After  a
+       The  effect of (*THEN) is now confined to the inner subpattern. After a
        failure in C, matching moves to (*FAIL), which causes the whole subpat-
-       tern to fail because there are no more alternatives  to  try.  In  this
+       tern  to  fail  because  there are no more alternatives to try. In this
        case, matching does now backtrack into A.


        Note also that a conditional subpattern is not considered as having two
-       alternatives, because only one is ever used.  In  other  words,  the  |
+       alternatives,  because  only  one  is  ever used. In other words, the |
        character in a conditional subpattern has a different meaning. Ignoring
        white space, consider:


          ^.*? (?(?=a) a | b(*THEN)c )


-       If the subject is "ba", this pattern does not  match.  Because  .*?  is
-       ungreedy,  it  initially  matches  zero characters. The condition (?=a)
-       then fails, the character "b" is matched,  but  "c"  is  not.  At  this
-       point,  matching does not backtrack to .*? as might perhaps be expected
-       from the presence of the | character.  The  conditional  subpattern  is
+       If  the  subject  is  "ba", this pattern does not match. Because .*? is
+       ungreedy, it initially matches zero  characters.  The  condition  (?=a)
+       then  fails,  the  character  "b"  is  matched, but "c" is not. At this
+       point, matching does not backtrack to .*? as might perhaps be  expected
+       from  the  presence  of  the | character. The conditional subpattern is
        part of the single alternative that comprises the whole pattern, and so
-       the match fails. (If there was a backtrack into  .*?,  allowing  it  to
+       the  match  fails.  (If  there was a backtrack into .*?, allowing it to
        match "b", the match would succeed.)


-       The  verbs just described provide four different "strengths" of control
+       The verbs just described provide four different "strengths" of  control
        when subsequent matching fails. (*THEN) is the weakest, carrying on the
-       match  at  the next alternative. (*PRUNE) comes next, failing the match
-       at the current starting position, but allowing an advance to  the  next
-       character  (for an unanchored pattern). (*SKIP) is similar, except that
+       match at the next alternative. (*PRUNE) comes next, failing  the  match
+       at  the  current starting position, but allowing an advance to the next
+       character (for an unanchored pattern). (*SKIP) is similar, except  that
        the advance may be more than one character. (*COMMIT) is the strongest,
        causing the entire match to fail.


@@ -6451,15 +6454,15 @@

          (A(*COMMIT)B(*THEN)C|D)


-       Once  A  has  matched,  PCRE is committed to this match, at the current
-       starting position. If subsequently B matches, but C does not, the  nor-
+       Once A has matched, PCRE is committed to this  match,  at  the  current
+       starting  position. If subsequently B matches, but C does not, the nor-
        mal (*THEN) action of trying the next alternative (that is, D) does not
        happen because (*COMMIT) overrides.



SEE ALSO

-       pcreapi(3), pcrecallout(3),  pcrematching(3),  pcresyntax(3),  pcre(3),
+       pcreapi(3),  pcrecallout(3),  pcrematching(3),  pcresyntax(3), pcre(3),
        pcre16(3).



@@ -6472,7 +6475,7 @@

REVISION

-       Last updated: 24 February 2012
+       Last updated: 14 April 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------


@@ -6915,127 +6918,130 @@

        When you set the PCRE_UTF8 flag, the byte strings  passed  as  patterns
        and subjects are (by default) checked for validity on entry to the rel-
-       evant functions. From release 7.3 of PCRE, the check is  according  the
+       evant functions. The entire string is checked before any other process-
+       ing  takes  place. From release 7.3 of PCRE, the check is according the
        rules of RFC 3629, which are themselves derived from the Unicode speci-
-       fication. Earlier releases of PCRE followed  the  rules  of  RFC  2279,
-       which  allows  the  full  range of 31-bit values (0 to 0x7FFFFFFF). The
-       current check allows only values in the range U+0 to U+10FFFF,  exclud-
+       fication.  Earlier  releases  of  PCRE  followed the rules of RFC 2279,
+       which allows the full range of 31-bit values  (0  to  0x7FFFFFFF).  The
+       current  check allows only values in the range U+0 to U+10FFFF, exclud-
        ing U+D800 to U+DFFF.


-       The  excluded code points are the "Surrogate Area" of Unicode. They are
-       reserved for use by UTF-16, where they are  used  in  pairs  to  encode
-       codepoints  with  values  greater than 0xFFFF. The code points that are
+       The excluded code points are the "Surrogate Area" of Unicode. They  are
+       reserved  for  use  by  UTF-16,  where they are used in pairs to encode
+       codepoints with values greater than 0xFFFF. The code  points  that  are
        encoded by UTF-16 pairs are available independently in the UTF-8 encod-
-       ing.  (In  other words, the whole surrogate thing is a fudge for UTF-16
+       ing. (In other words, the whole surrogate thing is a fudge  for  UTF-16
        which unfortunately messes up UTF-8.)


        If an invalid UTF-8 string is passed to PCRE, an error return is given.
-       At  compile  time, the only additional information is the offset to the
-       first byte of the failing character. The runtime functions  pcre_exec()
-       and  pcre_dfa_exec() also pass back this information, as well as a more
-       detailed reason code if the caller has provided memory in which  to  do
+       At compile time, the only additional information is the offset  to  the
+       first  byte of the failing character. The runtime functions pcre_exec()
+       and pcre_dfa_exec() also pass back this information, as well as a  more
+       detailed  reason  code if the caller has provided memory in which to do
        this.


-       In  some  situations, you may already know that your strings are valid,
-       and therefore want to skip these checks in  order  to  improve  perfor-
-       mance. If you set the PCRE_NO_UTF8_CHECK flag at compile time or at run
-       time, PCRE assumes that the pattern or subject  it  is  given  (respec-
-       tively)  contains  only  valid  UTF-8  codes. In this case, it does not
-       diagnose an invalid UTF-8 string.
+       In some situations, you may already know that your strings  are  valid,
+       and  therefore  want  to  skip these checks in order to improve perfor-
+       mance, for example in the case of a long subject string that  is  being
+       scanned   repeatedly   with   different   patterns.   If  you  set  the
+       PCRE_NO_UTF8_CHECK flag at compile time or at run  time,  PCRE  assumes
+       that  the  pattern  or subject it is given (respectively) contains only
+       valid UTF-8 codes. In this case, it does not diagnose an invalid  UTF-8
+       string.


-       If you pass an invalid UTF-8 string  when  PCRE_NO_UTF8_CHECK  is  set,
-       what  happens  depends on why the string is invalid. If the string con-
+       If  you  pass  an  invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set,
+       what happens depends on why the string is invalid. If the  string  con-
        forms to the "old" definition of UTF-8 (RFC 2279), it is processed as a
-       string  of  characters  in the range 0 to 0x7FFFFFFF by pcre_dfa_exec()
-       and the interpreted version of pcre_exec(). In other words, apart  from
-       the  initial validity test, these functions (when in UTF-8 mode) handle
-       strings according to the more liberal rules of RFC 2279.  However,  the
+       string of characters in the range 0 to  0x7FFFFFFF  by  pcre_dfa_exec()
+       and  the interpreted version of pcre_exec(). In other words, apart from
+       the initial validity test, these functions (when in UTF-8 mode)  handle
+       strings  according  to the more liberal rules of RFC 2279. However, the
        just-in-time (JIT) optimization for pcre_exec() supports only RFC 3629.
-       If you are using JIT optimization, or if the string does not even  con-
+       If  you are using JIT optimization, or if the string does not even con-
        form to RFC 2279, the result is undefined. Your program may crash.


-       If  you  want  to  process  strings  of  values  in the full range 0 to
-       0x7FFFFFFF, encoded in a UTF-8-like manner as per the old RFC, you  can
+       If you want to process strings  of  values  in  the  full  range  0  to
+       0x7FFFFFFF,  encoded in a UTF-8-like manner as per the old RFC, you can
        set PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in
-       this situation, you will have to apply your  own  validity  check,  and
+       this  situation,  you  will  have to apply your own validity check, and
        avoid the use of JIT optimization.


    Validity of UTF-16 strings


        When you set the PCRE_UTF16 flag, the strings of 16-bit data units that
        are passed as patterns and subjects are (by default) checked for valid-
-       ity  on entry to the relevant functions. Values other than those in the
+       ity on entry to the relevant functions. Values other than those in  the
        surrogate range U+D800 to U+DFFF are independent code points. Values in
        the surrogate range must be used in pairs in the correct manner.


-       If  an  invalid  UTF-16  string  is  passed to PCRE, an error return is
-       given. At compile time, the only additional information is  the  offset
-       to  the first data unit of the failing character. The runtime functions
+       If an invalid UTF-16 string is passed  to  PCRE,  an  error  return  is
+       given.  At  compile time, the only additional information is the offset
+       to the first data unit of the failing character. The runtime  functions
        pcre16_exec() and pcre16_dfa_exec() also pass back this information, as
-       well  as  a more detailed reason code if the caller has provided memory
+       well as a more detailed reason code if the caller has  provided  memory
        in which to do this.


-       In some situations, you may already know that your strings  are  valid,
-       and  therefore  want  to  skip these checks in order to improve perfor-
-       mance. If you set the PCRE_NO_UTF16_CHECK flag at compile  time  or  at
+       In  some  situations, you may already know that your strings are valid,
+       and therefore want to skip these checks in  order  to  improve  perfor-
+       mance.  If  you  set the PCRE_NO_UTF16_CHECK flag at compile time or at
        run time, PCRE assumes that the pattern or subject it is given (respec-
        tively) contains only valid UTF-16 sequences. In this case, it does not
        diagnose an invalid UTF-16 string.


    General comments about UTF modes


-       1.  Codepoints  less  than  256  can  be  specified by either braced or
-       unbraced hexadecimal escape sequences (for example,  \x{b3}  or  \xb3).
+       1. Codepoints less than 256  can  be  specified  by  either  braced  or
+       unbraced  hexadecimal  escape  sequences (for example, \x{b3} or \xb3).
        Larger values have to use braced sequences.


-       2.  Octal  numbers  up  to \777 are recognized, and in UTF-8 mode, they
+       2. Octal numbers up to \777 are recognized, and  in  UTF-8  mode,  they
        match two-byte characters for values greater than \177.


        3. Repeat quantifiers apply to complete UTF characters, not to individ-
        ual data units, for example: \x{100}{3}.


-       4.  The dot metacharacter matches one UTF character instead of a single
+       4. The dot metacharacter matches one UTF character instead of a  single
        data unit.


-       5. The escape sequence \C can be used to match a single byte  in  UTF-8
+       5.  The  escape sequence \C can be used to match a single byte in UTF-8
        mode, or a single 16-bit data unit in UTF-16 mode, but its use can lead
        to some strange effects because it breaks up multi-unit characters (see
-       the  description of \C in the pcrepattern documentation). The use of \C
-       is   not   supported   in    the    alternative    matching    function
-       pcre[16]_dfa_exec(),  nor  is it supported in UTF mode by the JIT opti-
+       the description of \C in the pcrepattern documentation). The use of  \C
+       is    not    supported    in    the   alternative   matching   function
+       pcre[16]_dfa_exec(), nor is it supported in UTF mode by the  JIT  opti-
        mization of pcre[16]_exec(). If JIT optimization is requested for a UTF
        pattern that contains \C, it will not succeed, and so the matching will
        be carried out by the normal interpretive function.


-       6. The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly
+       6.  The  character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
        test characters of any code value, but, by default, the characters that
-       PCRE recognizes as digits, spaces, or word characters remain  the  same
-       set  as  in  non-UTF  mode, all with values less than 256. This remains
-       true even when PCRE is  built  to  include  Unicode  property  support,
+       PCRE  recognizes  as digits, spaces, or word characters remain the same
+       set as in non-UTF mode, all with values less  than  256.  This  remains
+       true  even  when  PCRE  is  built  to include Unicode property support,
        because to do otherwise would slow down PCRE in many common cases. Note
-       in particular that this applies to \b and \B, because they are  defined
+       in  particular that this applies to \b and \B, because they are defined
        in terms of \w and \W. If you really want to test for a wider sense of,
-       say, "digit", you can use  explicit  Unicode  property  tests  such  as
+       say,  "digit",  you  can  use  explicit  Unicode property tests such as
        \p{Nd}. Alternatively, if you set the PCRE_UCP option, the way that the
-       character escapes work is changed so that Unicode properties  are  used
+       character  escapes  work is changed so that Unicode properties are used
        to determine which characters match. There are more details in the sec-
        tion on generic character types in the pcrepattern documentation.


-       7. Similarly, characters that match the POSIX named  character  classes
+       7.  Similarly,  characters that match the POSIX named character classes
        are all low-valued characters, unless the PCRE_UCP option is set.


-       8.  However,  the  horizontal  and vertical whitespace matching escapes
-       (\h, \H, \v, and \V) do match all the appropriate  Unicode  characters,
+       8. However, the horizontal and  vertical  whitespace  matching  escapes
+       (\h,  \H,  \v, and \V) do match all the appropriate Unicode characters,
        whether or not PCRE_UCP is set.


-       9.  Case-insensitive  matching  applies only to characters whose values
-       are less than 128, unless PCRE is built with Unicode property  support.
-       Even  when  Unicode  property support is available, PCRE still uses its
-       own character tables when checking the case of  low-valued  characters,
-       so  as not to degrade performance.  The Unicode property information is
+       9. Case-insensitive matching applies only to  characters  whose  values
+       are  less than 128, unless PCRE is built with Unicode property support.
+       Even when Unicode property support is available, PCRE  still  uses  its
+       own  character  tables when checking the case of low-valued characters,
+       so as not to degrade performance.  The Unicode property information  is
        used only for characters with higher values. Furthermore, PCRE supports
-       case-insensitive  matching  only  when  there  is  a one-to-one mapping
-       between a letter's cases. There are a small number of many-to-one  map-
+       case-insensitive matching only  when  there  is  a  one-to-one  mapping
+       between  a letter's cases. There are a small number of many-to-one map-
        pings in Unicode; these are not supported by PCRE.



@@ -7048,7 +7054,7 @@

REVISION

-       Last updated: 13 January 2012
+       Last updated: 14 April 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------


@@ -7195,8 +7201,9 @@
UNSUPPORTED OPTIONS AND PATTERN ITEMS

        The  only  pcre_exec() options that are supported for JIT execution are
-       PCRE_NO_UTF8_CHECK,    PCRE_NOTBOL,     PCRE_NOTEOL,     PCRE_NOTEMPTY,
-       PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT.
+       PCRE_NO_UTF8_CHECK,  PCRE_NO_UTF16_CHECK,   PCRE_NOTBOL,   PCRE_NOTEOL,
+       PCRE_NOTEMPTY,  PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PAR-
+       TIAL_SOFT.


        The unsupported pattern items are:


@@ -7213,65 +7220,65 @@

RETURN VALUES FROM JIT EXECUTION

-       When  a  pattern  is matched using JIT execution, the return values are
-       the same as those given by the interpretive pcre_exec() code, with  the
-       addition  of  one new error code: PCRE_ERROR_JIT_STACKLIMIT. This means
-       that the memory used for the JIT stack was insufficient. See  "Control-
+       When a pattern is matched using JIT execution, the  return  values  are
+       the  same as those given by the interpretive pcre_exec() code, with the
+       addition of one new error code: PCRE_ERROR_JIT_STACKLIMIT.  This  means
+       that  the memory used for the JIT stack was insufficient. See "Control-
        ling the JIT stack" below for a discussion of JIT stack usage. For com-
-       patibility with the interpretive pcre_exec() code, no  more  than  two-
-       thirds  of  the ovector argument is used for passing back captured sub-
+       patibility  with  the  interpretive pcre_exec() code, no more than two-
+       thirds of the ovector argument is used for passing back  captured  sub-
        strings.


-       The error code PCRE_ERROR_MATCHLIMIT is returned by  the  JIT  code  if
-       searching  a  very large pattern tree goes on for too long, as it is in
-       the same circumstance when JIT is not used, but the details of  exactly
-       what  is  counted are not the same. The PCRE_ERROR_RECURSIONLIMIT error
+       The  error  code  PCRE_ERROR_MATCHLIMIT  is returned by the JIT code if
+       searching a very large pattern tree goes on for too long, as it  is  in
+       the  same circumstance when JIT is not used, but the details of exactly
+       what is counted are not the same. The  PCRE_ERROR_RECURSIONLIMIT  error
        code is never returned by JIT execution.



SAVING AND RESTORING COMPILED PATTERNS

-       The code that is generated by the  JIT  compiler  is  architecture-spe-
-       cific,  and  is also position dependent. For those reasons it cannot be
-       saved (in a file or database) and restored later like the bytecode  and
-       other  data  of  a compiled pattern. Saving and restoring compiled pat-
-       terns is not something many people do. More detail about this  facility
-       is  given in the pcreprecompile documentation. It should be possible to
-       run pcre_study() on a saved and restored pattern, and thereby  recreate
-       the  JIT  data, but because JIT compilation uses significant resources,
-       it is probably not worth doing this; you might as  well  recompile  the
+       The  code  that  is  generated by the JIT compiler is architecture-spe-
+       cific, and is also position dependent. For those reasons it  cannot  be
+       saved  (in a file or database) and restored later like the bytecode and
+       other data of a compiled pattern. Saving and  restoring  compiled  pat-
+       terns  is not something many people do. More detail about this facility
+       is given in the pcreprecompile documentation. It should be possible  to
+       run  pcre_study() on a saved and restored pattern, and thereby recreate
+       the JIT data, but because JIT compilation uses  significant  resources,
+       it  is  probably  not worth doing this; you might as well recompile the
        original pattern.



CONTROLLING THE JIT STACK

        When the compiled JIT code runs, it needs a block of memory to use as a
-       stack.  By default, it uses 32K on the  machine  stack.  However,  some
-       large   or   complicated  patterns  need  more  than  this.  The  error
-       PCRE_ERROR_JIT_STACKLIMIT is given when  there  is  not  enough  stack.
-       Three  functions  are provided for managing blocks of memory for use as
-       JIT stacks. There is further discussion about the use of JIT stacks  in
+       stack.   By  default,  it  uses 32K on the machine stack. However, some
+       large  or  complicated  patterns  need  more  than  this.   The   error
+       PCRE_ERROR_JIT_STACKLIMIT  is  given  when  there  is not enough stack.
+       Three functions are provided for managing blocks of memory for  use  as
+       JIT  stacks. There is further discussion about the use of JIT stacks in
        the section entitled "JIT stack FAQ" below.


-       The  pcre_jit_stack_alloc() function creates a JIT stack. Its arguments
-       are a starting size and a maximum size, and it returns a pointer to  an
-       opaque  structure of type pcre_jit_stack, or NULL if there is an error.
-       The pcre_jit_stack_free() function can be used to free a stack that  is
-       no  longer  needed.  (For  the technically minded: the address space is
+       The pcre_jit_stack_alloc() function creates a JIT stack. Its  arguments
+       are  a starting size and a maximum size, and it returns a pointer to an
+       opaque structure of type pcre_jit_stack, or NULL if there is an  error.
+       The  pcre_jit_stack_free() function can be used to free a stack that is
+       no longer needed. (For the technically minded:  the  address  space  is
        allocated by mmap or VirtualAlloc.)


-       JIT uses far less memory for recursion than the interpretive code,  and
-       a  maximum  stack size of 512K to 1M should be more than enough for any
+       JIT  uses far less memory for recursion than the interpretive code, and
+       a maximum stack size of 512K to 1M should be more than enough  for  any
        pattern.


-       The pcre_assign_jit_stack() function specifies  which  stack  JIT  code
+       The  pcre_assign_jit_stack()  function  specifies  which stack JIT code
        should use. Its arguments are as follows:


          pcre_extra         *extra
          pcre_jit_callback  callback
          void               *data


-       The  extra  argument  must  be  the  result  of studying a pattern with
+       The extra argument must be  the  result  of  studying  a  pattern  with
        PCRE_STUDY_JIT_COMPILE etc. There are three cases for the values of the
        other two options:


@@ -7288,29 +7295,29 @@
              return value must be a valid JIT stack, the result of calling
              pcre_jit_stack_alloc().


-       A  callback function is obeyed whenever JIT code is about to be run; it
-       is not obeyed when pcre_exec() is called with options that  are  incom-
+       A callback function is obeyed whenever JIT code is about to be run;  it
+       is  not  obeyed when pcre_exec() is called with options that are incom-
        patible for JIT execution. A callback function can therefore be used to
-       determine whether a match operation was  executed  by  JIT  or  by  the
+       determine  whether  a  match  operation  was  executed by JIT or by the
        interpreter.


        You may safely use the same JIT stack for more than one pattern (either
-       by assigning directly or by callback), as long as the patterns are  all
-       matched  sequentially in the same thread. In a multithread application,
-       if you do not specify a JIT stack, or if you assign or pass  back  NULL
-       from  a  callback, that is thread-safe, because each thread has its own
-       machine stack. However, if you assign  or  pass  back  a  non-NULL  JIT
-       stack,  this  must  be  a  different  stack for each thread so that the
+       by  assigning directly or by callback), as long as the patterns are all
+       matched sequentially in the same thread. In a multithread  application,
+       if  you  do not specify a JIT stack, or if you assign or pass back NULL
+       from a callback, that is thread-safe, because each thread has  its  own
+       machine  stack.  However,  if  you  assign  or pass back a non-NULL JIT
+       stack, this must be a different stack  for  each  thread  so  that  the
        application is thread-safe.


-       Strictly speaking, even more is allowed. You can assign the  same  non-
-       NULL  stack  to any number of patterns as long as they are not used for
-       matching by multiple threads at the same time.  For  example,  you  can
-       assign  the same stack to all compiled patterns, and use a global mutex
-       in the callback to wait until the stack is available for use.  However,
+       Strictly  speaking,  even more is allowed. You can assign the same non-
+       NULL stack to any number of patterns as long as they are not  used  for
+       matching  by  multiple  threads  at the same time. For example, you can
+       assign the same stack to all compiled patterns, and use a global  mutex
+       in  the callback to wait until the stack is available for use. However,
        this is an inefficient solution, and not recommended.


-       This  is a suggestion for how a multithreaded program that needs to set
+       This is a suggestion for how a multithreaded program that needs to  set
        up non-default JIT stacks might operate:


          During thread initalization
@@ -7322,9 +7329,9 @@
          Use a one-line callback function
            return thread_local_var


-       All the functions described in this section do nothing if  JIT  is  not
-       available,  and  pcre_assign_jit_stack()  does nothing unless the extra
-       argument is non-NULL and points to  a  pcre_extra  block  that  is  the
+       All  the  functions  described in this section do nothing if JIT is not
+       available, and pcre_assign_jit_stack() does nothing  unless  the  extra
+       argument  is  non-NULL  and  points  to  a pcre_extra block that is the
        result of a successful study with PCRE_STUDY_JIT_COMPILE etc.



@@ -7332,73 +7339,73 @@

        (1) Why do we need JIT stacks?


-       PCRE  (and JIT) is a recursive, depth-first engine, so it needs a stack
-       where the local data of the current node is pushed before checking  its
+       PCRE (and JIT) is a recursive, depth-first engine, so it needs a  stack
+       where  the local data of the current node is pushed before checking its
        child nodes.  Allocating real machine stack on some platforms is diffi-
        cult. For example, the stack chain needs to be updated every time if we
-       extend  the  stack  on  PowerPC.  Although it is possible, its updating
+       extend the stack on PowerPC.  Although it  is  possible,  its  updating
        time overhead decreases performance. So we do the recursion in memory.


        (2) Why don't we simply allocate blocks of memory with malloc()?


-       Modern operating systems have a  nice  feature:  they  can  reserve  an
+       Modern  operating  systems  have  a  nice  feature: they can reserve an
        address space instead of allocating memory. We can safely allocate mem-
-       ory pages inside this address space, so the stack  could  grow  without
+       ory  pages  inside  this address space, so the stack could grow without
        moving memory data (this is important because of pointers). Thus we can
-       allocate 1M address space, and use only a single memory  page  (usually
-       4K)  if  that is enough. However, we can still grow up to 1M anytime if
+       allocate  1M  address space, and use only a single memory page (usually
+       4K) if that is enough. However, we can still grow up to 1M  anytime  if
        needed.


        (3) Who "owns" a JIT stack?


        The owner of the stack is the user program, not the JIT studied pattern
-       or  anything else. The user program must ensure that if a stack is used
-       by pcre_exec(), (that is, it is assigned to the pattern currently  run-
+       or anything else. The user program must ensure that if a stack is  used
+       by  pcre_exec(), (that is, it is assigned to the pattern currently run-
        ning), that stack must not be used by any other threads (to avoid over-
        writing the same memory area). The best practice for multithreaded pro-
-       grams  is  to  allocate  a stack for each thread, and return this stack
+       grams is to allocate a stack for each thread,  and  return  this  stack
        through the JIT callback function.


        (4) When should a JIT stack be freed?


        You can free a JIT stack at any time, as long as it will not be used by
-       pcre_exec()  again.  When  you  assign  the  stack to a pattern, only a
-       pointer is set. There is no reference counting or any other magic.  You
-       can  free  the  patterns  and stacks in any order, anytime. Just do not
-       call pcre_exec() with a pattern pointing to an already freed stack,  as
-       that  will cause SEGFAULT. (Also, do not free a stack currently used by
-       pcre_exec() in another thread). You can also replace the  stack  for  a
-       pattern  at  any  time.  You  can  even  free the previous stack before
+       pcre_exec() again. When you assign the  stack  to  a  pattern,  only  a
+       pointer  is set. There is no reference counting or any other magic. You
+       can free the patterns and stacks in any order,  anytime.  Just  do  not
+       call  pcre_exec() with a pattern pointing to an already freed stack, as
+       that will cause SEGFAULT. (Also, do not free a stack currently used  by
+       pcre_exec()  in  another  thread). You can also replace the stack for a
+       pattern at any time. You  can  even  free  the  previous  stack  before
        assigning a replacement.


-       (5) Should I allocate/free a  stack  every  time  before/after  calling
+       (5)  Should  I  allocate/free  a  stack every time before/after calling
        pcre_exec()?


-       No,  because  this  is  too  costly in terms of resources. However, you
-       could implement some clever idea which release the stack if it  is  not
+       No, because this is too costly in  terms  of  resources.  However,  you
+       could  implement  some clever idea which release the stack if it is not
        used in let's say two minutes. The JIT callback can help to achive this
        without keeping a list of the currently JIT studied patterns.


-       (6) OK, the stack is for long term memory allocation. But what  happens
-       if  a pattern causes stack overflow with a stack of 1M? Is that 1M kept
+       (6)  OK, the stack is for long term memory allocation. But what happens
+       if a pattern causes stack overflow with a stack of 1M? Is that 1M  kept
        until the stack is freed?


-       Especially on embedded sytems, it might be a good idea to release  mem-
-       ory  sometimes  without  freeing the stack. There is no API for this at
-       the moment.  Probably a function call which returns with the  currently
-       allocated  memory for any stack and another which allows releasing mem-
+       Especially  on embedded sytems, it might be a good idea to release mem-
+       ory sometimes without freeing the stack. There is no API  for  this  at
+       the  moment.  Probably a function call which returns with the currently
+       allocated memory for any stack and another which allows releasing  mem-
        ory (shrinking the stack) would be a good idea if someone needs this.


        (7) This is too much of a headache. Isn't there any better solution for
        JIT stack handling?


-       No,  thanks to Windows. If POSIX threads were used everywhere, we could
+       No, thanks to Windows. If POSIX threads were used everywhere, we  could
        throw out this complicated API.



EXAMPLE CODE

-       This is a single-threaded example that specifies a  JIT  stack  without
+       This  is  a  single-threaded example that specifies a JIT stack without
        using a callback.


          int rc;
@@ -7434,7 +7441,7 @@


REVISION

-       Last updated: 23 February 2012
+       Last updated: 14 April 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------



Modified: code/trunk/doc/pcre16.3
===================================================================
--- code/trunk/doc/pcre16.3    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/pcre16.3    2012-04-14 16:16:58 UTC (rev 959)
@@ -1,4 +1,4 @@
-.TH PCRE 3 "08 January 2012" "PCRE 8.30"
+.TH PCRE 3 "14 April 2012" "PCRE 8.31"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -264,7 +264,17 @@
 .sp
 There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK,
 which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
-fact, these new options define the same bits in the options word.
+fact, these new options define the same bits in the options word. There is a 
+discussion about the
+.\" HTML <a href="pcreunicode.html#utf16strings">
+.\" </a>
+validity of UTF-16 strings
+.\"
+in the
+.\" HREF
+\fBpcreunicode\fP
+.\"
+page. 
 .P
 For the \fBpcre16_config()\fP function there is an option PCRE_CONFIG_UTF16
 that returns 1 if UTF-16 support is configured, otherwise 0. If this option is
@@ -374,6 +384,6 @@
 .rs
 .sp
 .nf
-Last updated: 08 January 2012
+Last updated: 14 April 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/pcreapi.3    2012-04-14 16:16:58 UTC (rev 959)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "24 February 2012" "PCRE 8.31"
+.TH PCREAPI 3 "14 April 2012" "PCRE 8.31"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -1741,9 +1741,14 @@
 .sp
 When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
 string is automatically checked when \fBpcre_exec()\fP is subsequently called.
-The value of \fIstartoffset\fP is also checked to ensure that it points to the
-start of a UTF-8 character. There is a discussion about the validity of UTF-8
-strings in the
+The entire string is checked before any other processing takes place. The value
+of \fIstartoffset\fP is also checked to ensure that it points to the start of a
+UTF-8 character. There is a discussion about the
+.\" HTML <a href="pcreunicode.html#utf8strings">
+.\" </a>
+validity of UTF-8 strings
+.\"
+in the
 .\" HREF
 \fBpcreunicode\fP
 .\"
@@ -2653,6 +2658,6 @@
 .rs
 .sp
 .nf
-Last updated: 24 February 2012
+Last updated: 14 April 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrejit.3
===================================================================
--- code/trunk/doc/pcrejit.3    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/pcrejit.3    2012-04-14 16:16:58 UTC (rev 959)
@@ -1,4 +1,4 @@
-.TH PCREJIT 3 "23 February 2012" "PCRE 8.31"
+.TH PCREJIT 3 "14 April 2012" "PCRE 8.31"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE JUST-IN-TIME COMPILER SUPPORT"
@@ -142,8 +142,8 @@
 .rs
 .sp
 The only \fBpcre_exec()\fP options that are supported for JIT execution are
-PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,
-PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT.
+PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK, PCRE_NOTBOL, PCRE_NOTEOL,
+PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT.
 .P
 The unsupported pattern items are:
 .sp
@@ -400,6 +400,6 @@
 .rs
 .sp
 .nf
-Last updated: 23 February 2012
+Last updated: 14 April 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/pcrepattern.3    2012-04-14 16:16:58 UTC (rev 959)
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "24 February 2012" "PCRE 8.31"
+.TH PCREPATTERN 3 "14 April 2012" "PCRE 8.31"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -1020,7 +1020,8 @@
 unit with \eC in a UTF mode means that the rest of the string may start with a
 malformed UTF character. This has undefined results, because PCRE assumes that
 it is dealing with valid UTF strings (and by default it checks this at the
-start of processing unless the PCRE_NO_UTF8_CHECK option is used).
+start of processing unless the PCRE_NO_UTF8_CHECK or PCRE_NO_UTF16_CHECK option
+is used).
 .P
 PCRE does not allow \eC to appear in lookbehind assertions
 .\" HTML <a href="#lookbehind">
@@ -2909,6 +2910,6 @@
 .rs
 .sp
 .nf
-Last updated: 24 February 2012
+Last updated: 14 April 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcreunicode.3
===================================================================
--- code/trunk/doc/pcreunicode.3    2012-04-11 10:19:10 UTC (rev 958)
+++ code/trunk/doc/pcreunicode.3    2012-04-14 16:16:58 UTC (rev 959)
@@ -1,4 +1,4 @@
-.TH PCREUNICODE 3 "13 January 2012" "PCRE 8.30"
+.TH PCREUNICODE 3 "14 April 2012" "PCRE 8.30"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "UTF-8, UTF-16, AND UNICODE PROPERTY SUPPORT"
@@ -70,11 +70,12 @@
 .sp
 When you set the PCRE_UTF8 flag, the byte strings passed as patterns and
 subjects are (by default) checked for validity on entry to the relevant
-functions. From release 7.3 of PCRE, the check is according the rules of RFC
-3629, which are themselves derived from the Unicode specification. Earlier
-releases of PCRE followed the rules of RFC 2279, which allows the full range of
-31-bit values (0 to 0x7FFFFFFF). The current check allows only values in the
-range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.
+functions. The entire string is checked before any other processing takes
+place. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
+which are themselves derived from the Unicode specification. Earlier releases
+of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
+values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
+to U+10FFFF, excluding U+D800 to U+DFFF.
 .P
 The excluded code points are the "Surrogate Area" of Unicode. They are reserved
 for use by UTF-16, where they are used in pairs to encode codepoints with
@@ -89,10 +90,12 @@
 detailed reason code if the caller has provided memory in which to do this.
 .P
 In some situations, you may already know that your strings are valid, and
-therefore want to skip these checks in order to improve performance. If you set
-the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes that
-the pattern or subject it is given (respectively) contains only valid UTF-8
-codes. In this case, it does not diagnose an invalid UTF-8 string.
+therefore want to skip these checks in order to improve performance, for
+example in the case of a long subject string that is being scanned repeatedly
+with different patterns. If you set the PCRE_NO_UTF8_CHECK flag at compile time
+or at run time, PCRE assumes that the pattern or subject it is given
+(respectively) contains only valid UTF-8 codes. In this case, it does not
+diagnose an invalid UTF-8 string.
 .P
 If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what
 happens depends on why the string is invalid. If the string conforms to the
@@ -217,6 +220,6 @@
 .rs
 .sp
 .nf
-Last updated: 13 January 2012
+Last updated: 14 April 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi