[Pcre-svn] [975] code/trunk: Document update for 8.31-RC1 te…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [975] code/trunk: Document update for 8.31-RC1 test release.
Revision: 975
          http://vcs.pcre.org/viewvc?view=rev&revision=975
Author:   ph10
Date:     2012-06-02 12:03:06 +0100 (Sat, 02 Jun 2012)


Log Message:
-----------
Document update for 8.31-RC1 test release.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/Makefile.am
    code/trunk/NEWS
    code/trunk/README
    code/trunk/configure.ac
    code/trunk/doc/html/index.html
    code/trunk/doc/html/pcre16.html
    code/trunk/doc/html/pcre_assign_jit_stack.html
    code/trunk/doc/html/pcre_compile.html
    code/trunk/doc/html/pcre_compile2.html
    code/trunk/doc/html/pcre_jit_stack_alloc.html
    code/trunk/doc/html/pcreapi.html
    code/trunk/doc/html/pcrebuild.html
    code/trunk/doc/html/pcrecompat.html
    code/trunk/doc/html/pcrecpp.html
    code/trunk/doc/html/pcregrep.html
    code/trunk/doc/html/pcrejit.html
    code/trunk/doc/html/pcrelimits.html
    code/trunk/doc/html/pcrepartial.html
    code/trunk/doc/html/pcrepattern.html
    code/trunk/doc/html/pcresyntax.html
    code/trunk/doc/html/pcretest.html
    code/trunk/doc/html/pcreunicode.html
    code/trunk/doc/pcre.txt
    code/trunk/doc/pcre16.3
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcregrep.1
    code/trunk/doc/pcrejit.3
    code/trunk/doc/pcrepartial.3
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcretest.1
    code/trunk/pcre_compile.c
    code/trunk/pcre_dfa_exec.c
    code/trunk/pcre_exec.c
    code/trunk/pcre_fullinfo.c
    code/trunk/pcre_internal.h
    code/trunk/pcre_tables.c
    code/trunk/pcregrep.c
    code/trunk/pcreposix.c
    code/trunk/pcretest.c


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/ChangeLog    2012-06-02 11:03:06 UTC (rev 975)
@@ -1,8 +1,8 @@
 ChangeLog for PCRE
 ------------------


-Version 8.31
------------------------------
+Version 8.31 02-June-2012
+-------------------------

1. Fixing a wrong JIT test case and some compiler warnings.

@@ -95,20 +95,20 @@
     \w+ when the character tables indicated that \x{c4} was a word character.
     There were several related cases, all because the tests for doing a table
     lookup were testing for characters less than 127 instead of 255.
-    
+
 27. If a pattern contains capturing parentheses that are not used in a match,
-    their slots in the ovector are set to -1. For those that are higher than 
-    any matched groups, this happens at the end of processing. In the case when 
-    there were back references that the ovector was too small to contain 
-    (causing temporary malloc'd memory to be used during matching), and the 
-    highest capturing number was not used, memory off the end of the ovector 
-    was incorrectly being set to -1. (It was using the size of the temporary 
+    their slots in the ovector are set to -1. For those that are higher than
+    any matched groups, this happens at the end of processing. In the case when
+    there were back references that the ovector was too small to contain
+    (causing temporary malloc'd memory to be used during matching), and the
+    highest capturing number was not used, memory off the end of the ovector
+    was incorrectly being set to -1. (It was using the size of the temporary
     memory instead of the true size.)
-    
+
 28. To catch bugs like 27 using valgrind, when pcretest is asked to specify an
     ovector size, it uses memory at the end of the block that it has got.
-    
-29. Check for an overlong MARK name and give an error at compile time. The 
+
+29. Check for an overlong MARK name and give an error at compile time. The
     limit is 255 for the 8-bit library and 65535 for the 16-bit library.


30. JIT compiler update.
@@ -120,7 +120,7 @@

33. Variable renamings in the PCRE-JIT compiler. No functionality change.

-34. Fixed typos in pcregrep: in two places there was SUPPORT_LIBZ2 instead of 
+34. Fixed typos in pcregrep: in two places there was SUPPORT_LIBZ2 instead of
     SUPPORT_LIBBZ2. This caused a build problem when bzip2 but not gzip (zlib)
     was enabled.



Modified: code/trunk/Makefile.am
===================================================================
--- code/trunk/Makefile.am    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/Makefile.am    2012-06-02 11:03:06 UTC (rev 975)
@@ -356,6 +356,8 @@
 endif # WITH_PCRE8


EXTRA_DIST += \
+ testdata/grepbinary \
+ testdata/grepfilelist \
testdata/grepinput \
testdata/grepinput3 \
testdata/grepinput8 \

Modified: code/trunk/NEWS
===================================================================
--- code/trunk/NEWS    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/NEWS    2012-06-02 11:03:06 UTC (rev 975)
@@ -1,6 +1,32 @@
 News about PCRE releases
 ------------------------


+Release 8.31 02-June-2012
+-------------------------
+
+This is mainly a bug-fixing release, with a small number of developments:
+
+. The JIT compiler now supports partial matching and the (*MARK) and
+ (*COMMIT) verbs.
+
+. PCRE_INFO_MAXLOOKBEHIND can be used to find the longest lookbehing in a
+ pattern.
+
+. There should be a performance improvement when using the heap instead of the
+ stack for recursion.
+
+. pcregrep can now be linked with libedit as an alternative to libreadline.
+
+. pcregrep now has a --file-list option where the list of files to scan is
+ given as a file.
+
+. pcregrep now recognizes binary files and there are related options.
+
+. The Unicode tables have been updated to 6.1.0.
+
+As always, the full list of changes is in the ChangeLog file.
+
+
Release 8.30 04-February-2012
-----------------------------


Modified: code/trunk/README
===================================================================
--- code/trunk/README    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/README    2012-06-02 11:03:06 UTC (rev 975)
@@ -334,7 +334,7 @@
   the readline() function. This provides line-editing and history facilities.
   Note that libreadline is GPL-licenced, so if you distribute a binary of
   pcretest linked in this way, there may be licensing issues. These can be
-  avoided by linking with libedit (which has a BSD licence) instead. 
+  avoided by linking with libedit (which has a BSD licence) instead.


Enabling libreadline causes the -lreadline option to be added to the pcretest
build. In many operating environments with a sytem-installed readline

Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/configure.ac    2012-06-02 11:03:06 UTC (rev 975)
@@ -568,17 +568,17 @@
    fi
  fi
 fi
-  


+
# Check for the availability of libedit. Different distributions put its
# headers in different places. Try to cover the most common ones.

 if test "$enable_pcretest_libedit" = "yes"; then
   AC_CHECK_HEADERS([editline/readline.h], [HAVE_EDITLINE_READLINE_H=1],
     [AC_CHECK_HEADERS([edit/readline/readline.h], [HAVE_READLINE_READLINE_H=1],
-      [AC_CHECK_HEADERS([readline/readline.h], [HAVE_READLINE_READLINE_H=1])])]) 
+      [AC_CHECK_HEADERS([readline/readline.h], [HAVE_READLINE_READLINE_H=1])])])
   AC_CHECK_LIB([edit], [readline], [LIBEDIT="-ledit"])
-fi   
+fi


# This facilitates -ansi builds under Linux
dnl AC_DEFINE([_GNU_SOURCE], [], [Enable GNU extensions in glibc])

Modified: code/trunk/doc/html/index.html
===================================================================
--- code/trunk/doc/html/index.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/index.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -1,10 +1,10 @@
 <html>
-<!-- This is a manually maintained file that is the root of the HTML version of 
-     the PCRE documentation. When the HTML documents are built from the man 
-     page versions, the entire doc/html directory is emptied, this file is then 
-     copied into doc/html/index.html, and the remaining files therein are 
+<!-- This is a manually maintained file that is the root of the HTML version of
+     the PCRE documentation. When the HTML documents are built from the man
+     page versions, the entire doc/html directory is emptied, this file is then
+     copied into doc/html/index.html, and the remaining files therein are
      created by the 132html script.
--->      
+-->
 <head>
 <title>PCRE specification</title>
 </head>
@@ -86,11 +86,11 @@
 </table>


<p>
-There are also individual pages that summarize the interface for each function
+There are also individual pages that summarize the interface for each function
in the library. There is a single page for each pair of 8-bit/16-bit functions.
</p>

-<table>    
+<table>


 <tr><td><a href="pcre_assign_jit_stack.html">pcre_assign_jit_stack</a></td>
     <td>&nbsp;&nbsp;Assign stack for JIT matching</td></tr>
@@ -153,7 +153,7 @@


 <tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
     <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
-    
+
 <tr><td><a href="pcre_pattern_to_host_byte_order.html">pcre_pattern_to_host_byte_order</a></td>
     <td>&nbsp;&nbsp;Convert compiled pattern to host byte order if necessary</td></tr>



Modified: code/trunk/doc/html/pcre16.html
===================================================================
--- code/trunk/doc/html/pcre16.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre16.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -273,12 +273,12 @@
 <P>
 There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK,
 which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
-fact, these new options define the same bits in the options word. There is a 
+fact, these new options define the same bits in the options word. There is a
 discussion about the
 <a href="pcreunicode.html#utf16strings">validity of UTF-16 strings</a>
 in the
 <a href="pcreunicode.html"><b>pcreunicode</b></a>
-page. 
+page.
 </P>
 <P>
 For the <b>pcre16_config()</b> function there is an option PCRE_CONFIG_UTF16


Modified: code/trunk/doc/html/pcre_assign_jit_stack.html
===================================================================
--- code/trunk/doc/html/pcre_assign_jit_stack.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre_assign_jit_stack.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -30,7 +30,7 @@
 DESCRIPTION
 </b><br>
 <P>
-This function provides control over the memory used as a stack at runtime by a
+This function provides control over the memory used as a stack at run-time by a
 call to <b>pcre[16]_exec()</b> with a pattern that has been successfully
 compiled with JIT optimization. The arguments are:
 <pre>


Modified: code/trunk/doc/html/pcre_compile.html
===================================================================
--- code/trunk/doc/html/pcre_compile.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre_compile.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -54,7 +54,7 @@
   PCRE_DOLLAR_ENDONLY     $ not to match newline at end
   PCRE_DOTALL             . matches anything including NL
   PCRE_DUPNAMES           Allow duplicate names for subpatterns
-  PCRE_EXTENDED           Ignore whitespace and # comments
+  PCRE_EXTENDED           Ignore white space and # comments
   PCRE_EXTRA              PCRE extra features
                             (not much use currently)
   PCRE_FIRSTLINE          Force matching to be before newline


Modified: code/trunk/doc/html/pcre_compile2.html
===================================================================
--- code/trunk/doc/html/pcre_compile2.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre_compile2.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -57,7 +57,7 @@
   PCRE_DOLLAR_ENDONLY     $ not to match newline at end
   PCRE_DOTALL             . matches anything including NL
   PCRE_DUPNAMES           Allow duplicate names for subpatterns
-  PCRE_EXTENDED           Ignore whitespace and # comments
+  PCRE_EXTENDED           Ignore white space and # comments
   PCRE_EXTRA              PCRE extra features
                             (not much use currently)
   PCRE_FIRSTLINE          Force matching to be before newline


Modified: code/trunk/doc/html/pcre_jit_stack_alloc.html
===================================================================
--- code/trunk/doc/html/pcre_jit_stack_alloc.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre_jit_stack_alloc.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -33,7 +33,7 @@
 This function is used to create a stack for use by the code compiled by the JIT
 optimization of <b>pcre[16]_study()</b>. The arguments are a starting size for
 the stack, and a maximum size to which it is allowed to grow. The result can be
-passed to the JIT runtime code by <b>pcre[16]_assign_jit_stack()</b>, or that
+passed to the JIT run-time code by <b>pcre[16]_assign_jit_stack()</b>, or that
 function can set up a callback for obtaining a stack. A maximum stack size of
 512K to 1M should be more than enough for any pattern. For more details, see
 the


Modified: code/trunk/doc/html/pcreapi.html
===================================================================
--- code/trunk/doc/html/pcreapi.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcreapi.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -317,7 +317,7 @@
 strings: a single CR (carriage return) character, a single LF (linefeed)
 character, the two-character sequence CRLF, any of the three preceding, or any
 Unicode newline sequence. The Unicode newline sequences are the three just
-mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed,
+mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed,
 U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
 (paragraph separator, U+2029).
 </P>
@@ -641,8 +641,8 @@
 <pre>
   PCRE_EXTENDED
 </pre>
-If this bit is set, whitespace data characters in the pattern are totally
-ignored except when escaped or inside a character class. Whitespace does not
+If this bit is set, white space data characters in the pattern are totally
+ignored except when escaped or inside a character class. White space does not
 include the VT character (code 11). In addition, characters between an
 unescaped # outside a character class and the next newline, inclusive, are also
 ignored. This is equivalent to Perl's /x option, and it can be changed within a
@@ -659,7 +659,7 @@
 </P>
 <P>
 This option makes it possible to include comments inside complicated patterns.
-Note, however, that this applies only to data characters. Whitespace characters
+Note, however, that this applies only to data characters. White space characters
 may never appear within special character sequences in a pattern, for example
 within the sequence (?( that introduces a conditional subpattern.
 <pre>
@@ -745,7 +745,7 @@
 preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies
 that any Unicode newline sequence should be recognized. The Unicode newline
 sequences are the three just mentioned, plus the single characters VT (vertical
-tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
+tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
 separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit
 library, the last two are recognized only in UTF-8 mode.
 </P>
@@ -759,7 +759,7 @@
 </P>
 <P>
 The only time that a line break in a pattern is specially recognized when
-compiling is when PCRE_EXTENDED is set. CR and LF are whitespace characters,
+compiling is when PCRE_EXTENDED is set. CR and LF are white space characters,
 and so are ignored in this mode. Also, an unescaped # outside a character class
 indicates a comment that lasts until after the next line break sequence. In
 other circumstances, line break sequences in patterns are treated as literal
@@ -916,6 +916,7 @@
   72  too many forward references
   73  disallowed Unicode code point (&#62;= 0xd800 && &#60;= 0xdfff)
   74  invalid UTF-16 string (specifically UTF-16)
+  75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
 </pre>
 The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
 be used if the limits were changed when PCRE was built.
@@ -950,7 +951,7 @@
 </P>
 <P>
 The second argument of <b>pcre_study()</b> contains option bits. There are three
-options: 
+options:
 <pre>
   PCRE_STUDY_JIT_COMPILE
   PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
@@ -1231,7 +1232,7 @@
 </pre>
 Return the number of characters (NB not bytes) in the longest lookbehind
 assertion in the pattern. Note that the simple assertions \b and \B require a
-one-character lookbehind. This information is useful when doing multi-segment 
+one-character lookbehind. This information is useful when doing multi-segment
 matching using the partial matching facilities.
 <pre>
   PCRE_INFO_MINLENGTH
@@ -1506,7 +1507,7 @@
 Limiting the recursion depth limits the amount of machine stack that can be
 used, or, when PCRE has been compiled to use memory on the heap instead of the
 stack, the amount of heap memory that can be used. This limit is not relevant,
-and is ignored, when matching is done using JIT compiled code. 
+and is ignored, when matching is done using JIT compiled code.
 </P>
 <P>
 The default value for <i>match_limit_recursion</i> can be set when PCRE is
@@ -1689,7 +1690,7 @@
 "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
 are considered at every possible starting position in the subject string. If
 PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
-time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set, 
+time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
 matching is always done using interpretively.
 </P>
 <P>
@@ -2084,12 +2085,12 @@
 <a href="pcrejit.html"><b>pcrejit</b></a>
 documentation for more details.
 <pre>
-  PCRE_ERROR_BADMODE (-28)
+  PCRE_ERROR_BADMODE        (-28)
 </pre>
 This error is given if a pattern that was compiled by the 8-bit library is
 passed to a 16-bit library function, or vice versa.
 <pre>
-  PCRE_ERROR_BADENDIANNESS (-29)
+  PCRE_ERROR_BADENDIANNESS  (-29)
 </pre>
 This error is given if a pattern that was compiled and saved is reloaded on a
 host with different endianness. The utility function
@@ -2097,7 +2098,7 @@
 so that it runs on the new host.
 </P>
 <P>
-Error numbers -16 to -20 and -22 are not used by <b>pcre_exec()</b>.
+Error numbers -16 to -20, -22, and -30 are not used by <b>pcre_exec()</b>.
 <a name="badutf8reasons"></a></P>
 <br><b>
 Reason codes for invalid UTF-8 strings
@@ -2592,6 +2593,13 @@
 recursively, using private vectors for <i>ovector</i> and <i>workspace</i>. This
 error is given if the output vector is not large enough. This should be
 extremely rare, as a vector of size 1000 is used.
+<pre>
+  PCRE_ERROR_DFA_BADRESTART (-30)
+</pre>
+When <b>pcre_dfa_exec()</b> is called with the <b>PCRE_DFA_RESTART</b> option,
+some plausibility checks are made on the contents of the workspace, which
+should contain data about the previous partial match. If any of these checks
+fail, this error is given.
 </P>
 <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
 <P>
@@ -2610,7 +2618,7 @@
 </P>
 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 14 April 2012
+Last updated: 04 May 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrebuild.html
===================================================================
--- code/trunk/doc/html/pcrebuild.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrebuild.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -127,7 +127,7 @@
 </P>
 <P>
 If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
-its input to be either ASCII or UTF-8 (depending on the runtime option). It is
+its input to be either ASCII or UTF-8 (depending on the run-time option). It is
 not possible to support both EBCDIC and UTF-8 codes in the same version of the
 library. Consequently, --enable-utf and --enable-ebcdic are mutually
 exclusive.
@@ -317,7 +317,7 @@
 </pre>
 to the <b>configure</b> command, the distributed tables are no longer used.
 Instead, a program called <b>dftables</b> is compiled and run. This outputs the
-source for new set of tables, created in the default locale of your C runtime
+source for new set of tables, created in the default locale of your C run-time
 system. (This method of replacing the tables does not work if you are cross
 compiling, because <b>dftables</b> is run on the local host. If you need to
 create alternative tables when cross compiling, you will have to do so "by


Modified: code/trunk/doc/html/pcrecompat.html
===================================================================
--- code/trunk/doc/html/pcrecompat.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrecompat.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -107,8 +107,16 @@
 page.
 </P>
 <P>
-11. If (*THEN) is present in a group that is called as a subroutine, its action
-is limited to that group, even if the group does not contain any | characters.
+11. If any of the backtracking control verbs are used in an assertion or in a
+subpattern that is called as a subroutine (whether or not recursively), their
+effect is confined to that subpattern; it does not extend to the surrounding
+pattern. This is not always the case in Perl. In particular, if (*THEN) is
+present in a group that is called as a subroutine, its action is limited to
+that group, even if the group does not contain any | characters. There is one
+exception to this: the name from a *(MARK), (*PRUNE), or (*THEN) that is
+encountered in a successful positive assertion <i>is</i> passed back when a
+match succeeds (compare capturing parentheses in assertions). Note that such
+subpatterns are processed as anchored at the point where they are tested.
 </P>
 <P>
 12. There are some differences that are concerned with the settings of captured
@@ -129,7 +137,7 @@
 <P>
 14. Perl recognizes comments in some places that PCRE does not, for example,
 between the ( and ? at the start of a subpattern. If the /x modifier is set,
-Perl allows whitespace between ( and ? but PCRE never does, even if the
+Perl allows white space between ( and ? but PCRE never does, even if the
 PCRE_EXTENDED option is set.
 </P>
 <P>
@@ -203,7 +211,7 @@
 REVISION
 </b><br>
 <P>
-Last updated: 08 Januray 2012
+Last updated: 01 June 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrecpp.html
===================================================================
--- code/trunk/doc/html/pcrecpp.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrecpp.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -192,7 +192,7 @@
    PCRE_DOTALL           dot matches newlines        /s
    PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
    PCRE_EXTRA            strict escape parsing       N/A
-   PCRE_EXTENDED         ignore whitespaces          /x
+   PCRE_EXTENDED         ignore white spaces         /x
    PCRE_UTF8             handles UTF8 chars          built-in
    PCRE_UNGREEDY         reverses * and *?           N/A
    PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)


Modified: code/trunk/doc/html/pcregrep.html
===================================================================
--- code/trunk/doc/html/pcregrep.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcregrep.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -128,7 +128,7 @@
 </P>
 <br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
 <P>
-By default, a file that contains a binary zero byte within the first 1024 bytes 
+By default, a file that contains a binary zero byte within the first 1024 bytes
 is identified as a binary file, and is processed specially. (GNU grep also
 identifies binary files in this manner.) See the <b>--binary-files</b> option
 for a means of changing the way binary files are handled.
@@ -172,7 +172,7 @@
 </P>
 <P>
 <b>--binary-files=</b><i>word</i>
-Specify how binary files are to be processed. If the word is "binary" (the 
+Specify how binary files are to be processed. If the word is "binary" (the
 default), pattern matching is performed on binary files, but the only output is
 "Binary file &#60;name&#62; matches" when a match succeeds. If the word is "text",
 which is equivalent to the <b>-a</b> or <b>--text</b> option, binary files are


Modified: code/trunk/doc/html/pcrejit.html
===================================================================
--- code/trunk/doc/html/pcrejit.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrejit.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -104,7 +104,7 @@
       pcre_free(study_ptr);
   #endif
 </pre>
-PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete 
+PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete
 matches. If you want to run partial matches using the PCRE_PARTIAL_HARD or
 PCRE_PARTIAL_SOFT options of <b>pcre_exec()</b>, you should set one or both of
 the following options in addition to, or instead of, PCRE_STUDY_JIT_COMPILE
@@ -129,7 +129,7 @@
 no JIT data is created. Otherwise, the compiled pattern is passed to the JIT
 compiler, which turns it into machine code that executes much faster than the
 normal interpretive code. When <b>pcre_exec()</b> is passed a <b>pcre_extra</b>
-block containing a pointer to JIT code of the appropriate mode (normal or 
+block containing a pointer to JIT code of the appropriate mode (normal or
 hard/soft partial), it obeys that code instead of running the interpreter. The
 result is identical, but the compiled JIT code runs much faster.
 </P>
@@ -169,10 +169,8 @@
 <pre>
   \C             match a single byte; not supported in UTF-8 mode
   (?Cn)          callouts
-  (*COMMIT)      )
-  (*MARK)        )
-  (*PRUNE)       ) the backtracking control verbs
-  (*SKIP)        )
+  (*PRUNE)       )
+  (*SKIP)        ) backtracking control verbs
   (*THEN)        )
 </pre>
 Support for some of these may be added in future.
@@ -250,15 +248,15 @@
   (2) If <i>callback</i> is NULL and <i>data</i> is not NULL, <i>data</i> must be
       a valid JIT stack, the result of calling <b>pcre_jit_stack_alloc()</b>.


-  (3) If <i>callback</i> is not NULL, it must point to a function that is 
-      called with <i>data</i> as an argument at the start of matching, in 
-      order to set up a JIT stack. If the return from the callback 
-      function is NULL, the internal 32K stack is used; otherwise the 
-      return value must be a valid JIT stack, the result of calling 
+  (3) If <i>callback</i> is not NULL, it must point to a function that is
+      called with <i>data</i> as an argument at the start of matching, in
+      order to set up a JIT stack. If the return from the callback
+      function is NULL, the internal 32K stack is used; otherwise the
+      return value must be a valid JIT stack, the result of calling
       <b>pcre_jit_stack_alloc()</b>.
 </pre>
-A callback function is obeyed whenever JIT code is about to be run; it is not 
-obeyed when <b>pcre_exec()</b> is called with options that are incompatible for 
+A callback function is obeyed whenever JIT code is about to be run; it is not
+obeyed when <b>pcre_exec()</b> is called with options that are incompatible for
 JIT execution. A callback function can therefore be used to determine whether a
 match operation was executed by JIT or by the interpreter.
 </P>
@@ -266,9 +264,9 @@
 You may safely use the same JIT stack for more than one pattern (either by
 assigning directly or by callback), as long as the patterns are all matched
 sequentially in the same thread. In a multithread application, if you do not
-specify a JIT stack, or if you assign or pass back NULL from a callback, that 
-is thread-safe, because each thread has its own machine stack. However, if you 
-assign or pass back a non-NULL JIT stack, this must be a different stack for 
+specify a JIT stack, or if you assign or pass back NULL from a callback, that
+is thread-safe, because each thread has its own machine stack. However, if you
+assign or pass back a non-NULL JIT stack, this must be a different stack for
 each thread so that the application is thread-safe.
 </P>
 <P>
@@ -415,7 +413,7 @@
 </P>
 <br><a name="SEC13" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 14 April 2012
+Last updated: 04 May 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrelimits.html
===================================================================
--- code/trunk/doc/html/pcrelimits.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrelimits.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -48,6 +48,10 @@
 maximum number of named subpatterns is 10000.
 </P>
 <P>
+The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
+is 255 for the 8-bit library and 65535 for the 16-bit library.
+</P>
+<P>
 The maximum length of a subject string is the largest positive number that an
 integer variable can hold. However, when using the traditional matching
 function, PCRE uses recursion to handle subpatterns and indefinite repetition.
@@ -72,7 +76,7 @@
 REVISION
 </b><br>
 <P>
-Last updated: 08 January 2012
+Last updated: 04 May 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrepartial.html
===================================================================
--- code/trunk/doc/html/pcrepartial.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrepartial.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -58,14 +58,14 @@
 are set, PCRE_PARTIAL_HARD takes precedence.
 </P>
 <P>
-If you want to use partial matching with just-in-time optimized code, you must 
+If you want to use partial matching with just-in-time optimized code, you must
 call <b>pcre_study()</b> or <b>pcre16_study()</b> with one or both of these
 options:
 <pre>
   PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
   PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
 </pre>
-PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial 
+PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial
 matches on the same pattern. If the appropriate JIT study mode has not been set
 for a match, the interpretive matching code is used.
 </P>
@@ -354,8 +354,8 @@
 <P>
 2. Lookbehind assertions that have already been obeyed are catered for in the
 offsets that are returned for a partial match. However a lookbehind assertion
-later in the pattern could require even earlier characters to be inspected. You 
-can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the 
+later in the pattern could require even earlier characters to be inspected. You
+can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
 <b>pcre_fullinfo()</b> or <b>pcre16_fullinfo()</b> functions to obtain the length
 of the largest lookbehind in the pattern. This length is given in characters,
 not bytes. If you always retain at least that many characters before the
@@ -372,7 +372,7 @@
   data&#62; ab\P
   No match
 </pre>
-If the next segment begins "cx", a match should be found, but this will only 
+If the next segment begins "cx", a match should be found, but this will only
 happen if characters from the previous segment are retained. For this reason, a
 "no match" result should be interpreted as "partial match of an empty string"
 when the pattern contains lookbehinds.


Modified: code/trunk/doc/html/pcrepattern.html
===================================================================
--- code/trunk/doc/html/pcrepattern.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrepattern.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -227,10 +227,10 @@
 greater than 127) are treated as literals.
 </P>
 <P>
-If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the
+If a pattern is compiled with the PCRE_EXTENDED option, white space in the
 pattern (other than in a character class) and characters between a # outside
 a character class and the next newline are ignored. An escaping backslash can
-be used to include a whitespace or # character as part of the pattern.
+be used to include a white space or # character as part of the pattern.
 </P>
 <P>
 If you want to remove the special meaning from a sequence of characters, you
@@ -264,7 +264,7 @@
   \a        alarm, that is, the BEL character (hex 07)
   \cx       "control-x", where x is any ASCII character
   \e        escape (hex 1B)
-  \f        formfeed (hex 0C)
+  \f        form feed (hex 0C)
   \n        linefeed (hex 0A)
   \r        carriage return (hex 0D)
   \t        tab (hex 09)
@@ -406,12 +406,12 @@
 <pre>
   \d     any decimal digit
   \D     any character that is not a decimal digit
-  \h     any horizontal whitespace character
-  \H     any character that is not a horizontal whitespace character
-  \s     any whitespace character
-  \S     any character that is not a whitespace character
-  \v     any vertical whitespace character
-  \V     any character that is not a vertical whitespace character
+  \h     any horizontal white space character
+  \H     any character that is not a horizontal white space character
+  \s     any white space character
+  \S     any character that is not a white space character
+  \v     any vertical white space character
+  \V     any character that is not a vertical white space character
   \w     any "word" character
   \W     any "non-word" character
 </pre>
@@ -497,7 +497,7 @@
 <pre>
   U+000A     Linefeed
   U+000B     Vertical tab
-  U+000C     Formfeed
+  U+000C     Form feed
   U+000D     Carriage return
   U+0085     Next line
   U+2028     Line separator
@@ -520,7 +520,7 @@
 <a href="#atomicgroup">below.</a>
 This particular group matches either the two-character sequence CR followed by
 LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
-U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (next
+U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
 line, U+0085). The two-character sequence is treated as a single unit that
 cannot be split.
 </P>
@@ -822,7 +822,7 @@
   Xwd   Any Perl "word" character
 </pre>
 Xan matches characters that have either the L (letter) or the N (number)
-property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or
+property. Xps matches the characters tab, linefeed, vertical tab, form feed, or
 carriage return, and any other character that has the Z (separator) property.
 Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the
 same characters as Xan, plus underscore.
@@ -1829,7 +1829,7 @@
 following a backslash are taken as part of a potential back reference number.
 If the pattern continues with a digit character, some delimiter must be used to
 terminate the back reference. If the PCRE_EXTENDED option is set, this can be
-whitespace. Otherwise, the \g{ syntax or an empty comment (see
+white space. Otherwise, the \g{ syntax or an empty comment (see
 <a href="#comments">"Comments"</a>
 below) can be used.
 </P>
@@ -2171,7 +2171,7 @@
 subroutines that can be referenced from elsewhere. (The use of
 <a href="#subpatternsassubroutines">subroutines</a>
 is described below.) For example, a pattern to match an IPv4 address such as
-"192.168.23.245" could be written like this (ignore whitespace and line
+"192.168.23.245" could be written like this (ignore white space and line
 breaks):
 <pre>
   (?(DEFINE) (?&#60;byte&#62; 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
@@ -2565,17 +2565,18 @@
 a successful positive assertion <i>is</i> passed back when a match succeeds
 (compare capturing parentheses in assertions). Note that such subpatterns are
 processed as anchored at the point where they are tested. Note also that Perl's
-treatment of subroutines is different in some cases.
+treatment of subroutines and assertions is different in some cases.
 </P>
 <P>
 The new verbs make use of what was previously invalid syntax: an opening
 parenthesis followed by an asterisk. They are generally of the form
 (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
 depending on whether or not an argument is present. A name is any sequence of
-characters that does not include a closing parenthesis. If the name is empty,
-that is, if the closing parenthesis immediately follows the colon, the effect
-is as if the colon were not there. Any number of these verbs may occur in a
-pattern.
+characters that does not include a closing parenthesis. The maximum length of
+name is 255 in the 8-bit library and 65535 in the 16-bit library. If the name
+is empty, that is, if the closing parenthesis immediately follows the colon,
+the effect is as if the colon were not there. Any number of these verbs may
+occur in a pattern.
 <a name="nooptimize"></a></P>
 <br><b>
 Optimizations that affect backtracking verbs
@@ -2593,7 +2594,7 @@
 <a href="pcreapi.html#execoptions">"Option bits for <b>pcre_exec()</b>"</a>
 in the
 <a href="pcreapi.html"><b>pcreapi</b></a>
-documentation. 
+documentation.
 </P>
 <P>
 Experiments with Perl suggest that it too has similar optimizations, sometimes
@@ -2687,7 +2688,7 @@
 </P>
 <P>
 If you are interested in (*MARK) values after failed matches, you should
-probably set the PCRE_NO_START_OPTIMIZE option 
+probably set the PCRE_NO_START_OPTIMIZE option
 <a href="#nooptimize">(see above)</a>
 to ensure that the match is always attempted.
 </P>
@@ -2868,7 +2869,7 @@
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 14 April 2012
+Last updated: 01 June 2012
 <br>
 Copyright &copy; 1997-2012 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcresyntax.html
===================================================================
--- code/trunk/doc/html/pcresyntax.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcresyntax.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -61,7 +61,7 @@
   \a         alarm, that is, the BEL character (hex 07)
   \cx        "control-x", where x is any ASCII character
   \e         escape (hex 1B)
-  \f         formfeed (hex 0C)
+  \f         form feed (hex 0C)
   \n         newline (hex 0A)
   \r         carriage return (hex 0D)
   \t         tab (hex 09)
@@ -78,16 +78,16 @@
   \C         one data unit, even in UTF mode (best avoided)
   \d         a decimal digit
   \D         a character that is not a decimal digit
-  \h         a horizontal whitespace character
-  \H         a character that is not a horizontal whitespace character
+  \h         a horizontal white space character
+  \H         a character that is not a horizontal white space character
   \N         a character that is not a newline
   \p{<i>xx</i>}     a character with the <i>xx</i> property
   \P{<i>xx</i>}     a character without the <i>xx</i> property
   \R         a newline sequence
-  \s         a whitespace character
-  \S         a character that is not a whitespace character
-  \v         a vertical whitespace character
-  \V         a character that is not a vertical whitespace character
+  \s         a white space character
+  \S         a character that is not a white space character
+  \v         a vertical white space character
+  \V         a character that is not a vertical white space character
   \w         a "word" character
   \W         a "non-word" character
   \X         an extended Unicode sequence
@@ -278,7 +278,7 @@
   lower       lower case letter
   print       printing, including space
   punct       printing, excluding alphanumeric
-  space       whitespace
+  space       white space
   upper       upper case letter
   word        same as \w
   xdigit      hexadecimal digit


Modified: code/trunk/doc/html/pcretest.html
===================================================================
--- code/trunk/doc/html/pcretest.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcretest.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -166,8 +166,8 @@
 Behave as if each pattern has the <b>/S</b> modifier; in other words, force each
 pattern to be studied. If <b>-s+</b> is used, all the JIT compile options are
 passed to <b>pcre[16]_study()</b>, causing just-in-time optimization to be set
-up if it is available, for both full and partial matching. Specific JIT compile 
-options can be selected by following <b>-s+</b> with a digit in the range 1 to 
+up if it is available, for both full and partial matching. Specific JIT compile
+options can be selected by following <b>-s+</b> with a digit in the range 1 to
 7, which selects the JIT compile modes as follows:
 <pre>
   1  normal match only
@@ -453,7 +453,7 @@
 If the <b>/S</b> modifier is immediately followed by a + character, the call to
 <b>pcre[16]_study()</b> is made with all the JIT study options, requesting
 just-in-time optimization support if it is available, for both normal and
-partial matching. If you want to restrict the JIT compiling modes, you can 
+partial matching. If you want to restrict the JIT compiling modes, you can
 follow <b>/S+</b> with a digit in the range 1 to 7:
 <pre>
   1  normal match only
@@ -469,7 +469,7 @@
 </P>
 <P>
 Note that there is also an independent <b>/+</b> modifier; it must not be given
-immediately after <b>/S</b> or <b>/S+</b> because this will be misinterpreted. 
+immediately after <b>/S</b> or <b>/S+</b> because this will be misinterpreted.
 </P>
 <P>
 If JIT studying is successful, the compiled JIT code will automatically be used


Modified: code/trunk/doc/html/pcreunicode.html
===================================================================
--- code/trunk/doc/html/pcreunicode.html    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcreunicode.html    2012-06-02 11:03:06 UTC (rev 975)
@@ -91,7 +91,7 @@
 <P>
 If an invalid UTF-8 string is passed to PCRE, an error return is given. At
 compile time, the only additional information is the offset to the first byte
-of the failing character. The runtime functions <b>pcre_exec()</b> and
+of the failing character. The run-time functions <b>pcre_exec()</b> and
 <b>pcre_dfa_exec()</b> also pass back this information, as well as a more
 detailed reason code if the caller has provided memory in which to do this.
 </P>
@@ -136,7 +136,7 @@
 <P>
 If an invalid UTF-16 string is passed to PCRE, an error return is given. At
 compile time, the only additional information is the offset to the first data
-unit of the failing character. The runtime functions <b>pcre16_exec()</b> and
+unit of the failing character. The run-time functions <b>pcre16_exec()</b> and
 <b>pcre16_dfa_exec()</b> also pass back this information, as well as a more
 detailed reason code if the caller has provided memory in which to do this.
 </P>
@@ -202,7 +202,7 @@
 low-valued characters, unless the PCRE_UCP option is set.
 </P>
 <P>
-8. However, the horizontal and vertical whitespace matching escapes (\h, \H,
+8. However, the horizontal and vertical white space matching escapes (\h, \H,
 \v, and \V) do match all the appropriate Unicode characters, whether or not
 PCRE_UCP is set.
 </P>


Modified: code/trunk/doc/pcre.txt
===================================================================
--- code/trunk/doc/pcre.txt    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcre.txt    2012-06-02 11:03:06 UTC (rev 975)
@@ -138,8 +138,8 @@
        Last updated: 10 January 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRE(3)                                                                PCRE(3)



@@ -464,8 +464,8 @@
        Last updated: 14 April 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREBUILD(3)                                                      PCREBUILD(3)



@@ -568,9 +568,9 @@
        tern compiling functions.


        If you set --enable-utf when compiling in an EBCDIC  environment,  PCRE
-       expects its input to be either ASCII or UTF-8 (depending on the runtime
-       option). It is not possible to support both EBCDIC and UTF-8  codes  in
-       the  same  version  of  the  library.  Consequently,  --enable-utf  and
+       expects  its  input  to be either ASCII or UTF-8 (depending on the run-
+       time option). It is not possible to support both EBCDIC and UTF-8 codes
+       in  the  same  version  of  the library. Consequently, --enable-utf and
        --enable-ebcdic are mutually exclusive.



@@ -761,9 +761,9 @@
        to the configure command, the distributed tables are  no  longer  used.
        Instead,  a  program  called dftables is compiled and run. This outputs
        the source for new set of tables, created in the default locale of your
-       C runtime system. (This method of replacing the tables does not work if
-       you are cross compiling, because dftables is run on the local host.  If
-       you  need  to  create alternative tables when cross compiling, you will
+       C  run-time  system. (This method of replacing the tables does not work
+       if you are cross compiling, because dftables is run on the local  host.
+       If you need to create alternative tables when cross compiling, you will
        have to do so "by hand".)



@@ -860,8 +860,8 @@
        Last updated: 07 January 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREMATCHING(3)                                                PCREMATCHING(3)



@@ -1067,8 +1067,8 @@
        Last updated: 08 January 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREAPI(3)                                                          PCREAPI(3)



@@ -1311,7 +1311,7 @@
        feed) character, the two-character sequence CRLF, any of the three pre-
        ceding,  or any Unicode newline sequence. The Unicode newline sequences
        are the three just mentioned, plus the single characters  VT  (vertical
-       tab,  U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
+       tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
        separator, U+2028), and PS (paragraph separator, U+2029).


        Each of the first three conventions is used by at least  one  operating
@@ -1625,8 +1625,8 @@


          PCRE_EXTENDED


-       If  this  bit  is  set,  whitespace  data characters in the pattern are
-       totally ignored except when escaped or inside a character class. White-
+       If  this  bit  is  set,  white space data characters in the pattern are
+       totally ignored except when escaped or inside a character class.  White
        space does not include the VT character (code 11). In addition, charac-
        ters between an unescaped # outside a character class and the next new-
        line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x
@@ -1642,7 +1642,7 @@


        This option makes it possible to include  comments  inside  complicated
        patterns.   Note,  however,  that this applies only to data characters.
-       Whitespace  characters  may  never  appear  within  special   character
+       White space  characters  may  never  appear  within  special  character
        sequences in a pattern, for example within the sequence (?( that intro-
        duces a conditional subpattern.


@@ -1727,7 +1727,7 @@
        that any of the three preceding sequences should be recognized. Setting
        PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be
        recognized. The Unicode newline sequences are the three just mentioned,
-       plus  the  single  characters  VT (vertical tab, U+000B), FF (formfeed,
+       plus  the  single  characters VT (vertical tab, U+000B), FF (form feed,
        U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS
        (paragraph  separator, U+2029). For the 8-bit library, the last two are
        recognized only in UTF-8 mode.
@@ -1741,7 +1741,7 @@
        cause an error.


        The only time that a line break in a pattern  is  specially  recognized
-       when  compiling  is when PCRE_EXTENDED is set. CR and LF are whitespace
+       when  compiling is when PCRE_EXTENDED is set. CR and LF are white space
        characters, and so are ignored in this mode. Also, an unescaped #  out-
        side  a  character class indicates a comment that lasts until after the
        next line break sequence. In other circumstances, line break  sequences
@@ -1894,6 +1894,7 @@
          72  too many forward references
          73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
          74  invalid UTF-16 string (specifically UTF-16)
+         75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)


        The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different
        values may be used if the limits were changed when PCRE was built.
@@ -2993,19 +2994,19 @@
        for the just-in-time processing stack is  not  large  enough.  See  the
        pcrejit documentation for more details.


-         PCRE_ERROR_BADMODE (-28)
+         PCRE_ERROR_BADMODE        (-28)


        This error is given if a pattern that was compiled by the 8-bit library
        is passed to a 16-bit library function, or vice versa.


-         PCRE_ERROR_BADENDIANNESS (-29)
+         PCRE_ERROR_BADENDIANNESS  (-29)


        This error is given if  a  pattern  that  was  compiled  and  saved  is
        reloaded  on  a  host  with  different endianness. The utility function
        pcre_pattern_to_host_byte_order() can be used to convert such a pattern
        so that it runs on the new host.


-       Error numbers -16 to -20 and -22 are not used by pcre_exec().
+       Error numbers -16 to -20, -22, and -30 are not used by pcre_exec().


    Reason codes for invalid UTF-8 strings


@@ -3468,10 +3469,17 @@
        This error is given if the output vector  is  not  large  enough.  This
        should be extremely rare, as a vector of size 1000 is used.


+         PCRE_ERROR_DFA_BADRESTART (-30)


+       When  pcre_dfa_exec()  is called with the PCRE_DFA_RESTART option, some
+       plausibility checks are made on the contents of  the  workspace,  which
+       should  contain  data about the previous partial match. If any of these
+       checks fail, this error is given.
+
+
 SEE ALSO


-       pcre16(3),   pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),  pcrematch-
+       pcre16(3),  pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),   pcrematch-
        ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3),
        pcrestack(3).


@@ -3485,11 +3493,11 @@

REVISION

-       Last updated: 14 April 2012
+       Last updated: 04 May 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRECALLOUT(3)                                                  PCRECALLOUT(3)



@@ -3687,8 +3695,8 @@
        Last updated: 08 Janurary 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRECOMPAT(3)                                                    PCRECOMPAT(3)



@@ -3777,9 +3785,17 @@
        There is a discussion that explains these differences in more detail in
        the section on recursion differences from Perl in the pcrepattern page.


-       11.  If  (*THEN)  is present in a group that is called as a subroutine,
-       its action is limited to that group, even if the group does not contain
-       any | characters.
+       11.  If  any of the backtracking control verbs are used in an assertion
+       or in a subpattern that is called  as  a  subroutine  (whether  or  not
+       recursively),  their effect is confined to that subpattern; it does not
+       extend to the surrounding pattern. This is not always the case in Perl.
+       In  particular,  if  (*THEN)  is present in a group that is called as a
+       subroutine, its action is limited to that group, even if the group does
+       not  contain any | characters. There is one exception to this: the name
+       from a *(MARK), (*PRUNE), or (*THEN) that is encountered in a  success-
+       ful  positive  assertion  is passed back when a match succeeds (compare
+       capturing parentheses in assertions). Note that  such  subpatterns  are
+       processed as anchored at the point where they are tested.


        12.  There are some differences that are concerned with the settings of
        captured strings when part of  a  pattern  is  repeated.  For  example,
@@ -3799,7 +3815,7 @@


        14.  Perl  recognizes  comments  in some places that PCRE does not, for
        example, between the ( and ? at the start of a subpattern.  If  the  /x
-       modifier  is set, Perl allows whitespace between ( and ? but PCRE never
+       modifier is set, Perl allows white space between ( and ? but PCRE never
        does, even if the PCRE_EXTENDED option is set.


        15. PCRE provides some extensions to the Perl regular expression facil-
@@ -3859,11 +3875,11 @@


REVISION

-       Last updated: 08 Januray 2012
+       Last updated: 01 June 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPATTERN(3)                                                  PCREPATTERN(3)



@@ -4045,10 +4061,10 @@
        after  a  backslash.  All  other characters (in particular, those whose
        codepoints are greater than 127) are treated as literals.


-       If a pattern is compiled with the PCRE_EXTENDED option,  whitespace  in
+       If a pattern is compiled with the PCRE_EXTENDED option, white space  in
        the  pattern (other than in a character class) and characters between a
        # outside a character class and the next newline are ignored. An escap-
-       ing  backslash  can  be  used to include a whitespace or # character as
+       ing  backslash  can  be used to include a white space or # character as
        part of the pattern.


        If you want to remove the special meaning from a  sequence  of  charac-
@@ -4083,7 +4099,7 @@
          \a        alarm, that is, the BEL character (hex 07)
          \cx       "control-x", where x is any ASCII character
          \e        escape (hex 1B)
-         \f        formfeed (hex 0C)
+         \f        form feed (hex 0C)
          \n        linefeed (hex 0A)
          \r        carriage return (hex 0D)
          \t        tab (hex 09)
@@ -4212,12 +4228,12 @@


          \d     any decimal digit
          \D     any character that is not a decimal digit
-         \h     any horizontal whitespace character
-         \H     any character that is not a horizontal whitespace character
-         \s     any whitespace character
-         \S     any character that is not a whitespace character
-         \v     any vertical whitespace character
-         \V     any character that is not a vertical whitespace character
+         \h     any horizontal white space character
+         \H     any character that is not a horizontal white space character
+         \s     any white space character
+         \S     any character that is not a white space character
+         \v     any vertical white space character
+         \V     any character that is not a vertical white space character
          \w     any "word" character
          \W     any "non-word" character


@@ -4297,7 +4313,7 @@

          U+000A     Linefeed
          U+000B     Vertical tab
-         U+000C     Formfeed
+         U+000C     Form feed
          U+000D     Carriage return
          U+0085     Next line
          U+2028     Line separator
@@ -4317,9 +4333,9 @@
        This  is  an  example  of an "atomic group", details of which are given
        below.  This particular group matches either the two-character sequence
        CR  followed  by  LF,  or  one  of  the single characters LF (linefeed,
-       U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage
-       return, U+000D), or NEL (next line, U+0085). The two-character sequence
-       is treated as a single unit that cannot be split.
+       U+000A), VT (vertical tab, U+000B), FF (form feed,  U+000C),  CR  (car-
+       riage  return,  U+000D),  or NEL (next line, U+0085). The two-character
+       sequence is treated as a single unit that cannot be split.


        In other modes, two additional characters whose codepoints are  greater
        than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
@@ -4519,7 +4535,7 @@


        Xan matches characters that have either the L (letter) or the  N  (num-
        ber)  property. Xps matches the characters tab, linefeed, vertical tab,
-       formfeed, or carriage return, and any other character that  has  the  Z
+       form feed, or carriage return, and any other character that has  the  Z
        (separator) property.  Xsp is the same as Xps, except that vertical tab
        is excluded. Xwd matches the same characters as Xan, plus underscore.


@@ -5484,8 +5500,8 @@
        its following a backslash are taken as part of a potential back  refer-
        ence  number.   If  the  pattern continues with a digit character, some
        delimiter must  be  used  to  terminate  the  back  reference.  If  the
-       PCRE_EXTENDED option is set, this can be whitespace. Otherwise, the \g{
-       syntax or an empty comment (see "Comments" below) can be used.
+       PCRE_EXTENDED  option  is  set, this can be white space. Otherwise, the
+       \g{ syntax or an empty comment (see "Comments" below) can be used.


    Recursive back references


@@ -5797,7 +5813,7 @@
        DEFINE  is that it can be used to define subroutines that can be refer-
        enced from elsewhere. (The use of subroutines is described below.)  For
        example,  a  pattern  to match an IPv4 address such as "192.168.23.245"
-       could be written like this (ignore whitespace and line breaks):
+       could be written like this (ignore white space and line breaks):


          (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
          \b (?&byte) (\.(?&byte)){3} \b
@@ -6188,82 +6204,83 @@
        that  is  encountered in a successful positive assertion is passed back
        when a match succeeds (compare capturing  parentheses  in  assertions).
        Note that such subpatterns are processed as anchored at the point where
-       they are tested. Note also that Perl's treatment of subroutines is dif-
-       ferent in some cases.
+       they are tested. Note also that Perl's  treatment  of  subroutines  and
+       assertions is different in some cases.


        The  new verbs make use of what was previously invalid syntax: an open-
        ing parenthesis followed by an asterisk. They are generally of the form
        (*VERB)  or (*VERB:NAME). Some may take either form, with differing be-
        haviour, depending on whether or not an argument is present. A name  is
        any sequence of characters that does not include a closing parenthesis.
-       If the name is empty, that is, if the closing  parenthesis  immediately
-       follows  the  colon,  the effect is as if the colon were not there. Any
-       number of these verbs may occur in a pattern.
+       The maximum length of name is 255 in the 8-bit library and 65535 in the
+       16-bit library. If the name is empty, that is, if the closing parenthe-
+       sis immediately follows the colon, the effect is as if the  colon  were
+       not there. Any number of these verbs may occur in a pattern.


    Optimizations that affect backtracking verbs


-       PCRE contains some optimizations that are used to speed up matching  by
+       PCRE  contains some optimizations that are used to speed up matching by
        running some checks at the start of each match attempt. For example, it
-       may know the minimum length of matching subject, or that  a  particular
-       character  must  be present. When one of these optimizations suppresses
-       the running of a match, any included backtracking verbs  will  not,  of
+       may  know  the minimum length of matching subject, or that a particular
+       character must be present. When one of these  optimizations  suppresses
+       the  running  of  a match, any included backtracking verbs will not, of
        course, be processed. You can suppress the start-of-match optimizations
-       by setting the PCRE_NO_START_OPTIMIZE  option  when  calling  pcre_com-
+       by  setting  the  PCRE_NO_START_OPTIMIZE  option when calling pcre_com-
        pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).
        There is more discussion of this option in the section entitled "Option
        bits for pcre_exec()" in the pcreapi documentation.


-       Experiments  with  Perl  suggest that it too has similar optimizations,
+       Experiments with Perl suggest that it too  has  similar  optimizations,
        sometimes leading to anomalous results.


    Verbs that act immediately


-       The following verbs act as soon as they are encountered. They  may  not
+       The  following  verbs act as soon as they are encountered. They may not
        be followed by a name.


           (*ACCEPT)


-       This  verb causes the match to end successfully, skipping the remainder
-       of the pattern. However, when it is inside a subpattern that is  called
-       as  a  subroutine, only that subpattern is ended successfully. Matching
-       then continues at the outer level. If  (*ACCEPT)  is  inside  capturing
+       This verb causes the match to end successfully, skipping the  remainder
+       of  the pattern. However, when it is inside a subpattern that is called
+       as a subroutine, only that subpattern is ended  successfully.  Matching
+       then  continues  at  the  outer level. If (*ACCEPT) is inside capturing
        parentheses, the data so far is captured. For example:


          A((?:A|B(*ACCEPT)|C)D)


-       This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
+       This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
        tured by the outer parentheses.


          (*FAIL) or (*F)


-       This verb causes a matching failure, forcing backtracking to occur.  It
-       is  equivalent to (?!) but easier to read. The Perl documentation notes
-       that it is probably useful only when combined  with  (?{})  or  (??{}).
-       Those  are,  of course, Perl features that are not present in PCRE. The
-       nearest equivalent is the callout feature, as for example in this  pat-
+       This  verb causes a matching failure, forcing backtracking to occur. It
+       is equivalent to (?!) but easier to read. The Perl documentation  notes
+       that  it  is  probably  useful only when combined with (?{}) or (??{}).
+       Those are, of course, Perl features that are not present in  PCRE.  The
+       nearest  equivalent is the callout feature, as for example in this pat-
        tern:


          a+(?C)(*FAIL)


-       A  match  with the string "aaaa" always fails, but the callout is taken
+       A match with the string "aaaa" always fails, but the callout  is  taken
        before each backtrack happens (in this example, 10 times).


    Recording which path was taken


-       There is one verb whose main purpose  is  to  track  how  a  match  was
-       arrived  at,  though  it  also  has a secondary use in conjunction with
+       There  is  one  verb  whose  main  purpose  is to track how a match was
+       arrived at, though it also has a  secondary  use  in  conjunction  with
        advancing the match starting point (see (*SKIP) below).


          (*MARK:NAME) or (*:NAME)


-       A name is always  required  with  this  verb.  There  may  be  as  many
-       instances  of  (*MARK) as you like in a pattern, and their names do not
+       A  name  is  always  required  with  this  verb.  There  may be as many
+       instances of (*MARK) as you like in a pattern, and their names  do  not
        have to be unique.


-       When a match succeeds, the name of the last-encountered (*MARK) on  the
-       matching  path is passed back to the caller as described in the section
-       entitled "Extra data for pcre_exec()"  in  the  pcreapi  documentation.
-       Here  is  an example of pcretest output, where the /K modifier requests
+       When  a match succeeds, the name of the last-encountered (*MARK) on the
+       matching path is passed back to the caller as described in the  section
+       entitled  "Extra  data  for  pcre_exec()" in the pcreapi documentation.
+       Here is an example of pcretest output, where the /K  modifier  requests
        the retrieval and outputting of (*MARK) data:


            re> /X(*MARK:A)Y|X(*MARK:B)Z/K
@@ -6275,63 +6292,63 @@
          MK: B


        The (*MARK) name is tagged with "MK:" in this output, and in this exam-
-       ple  it indicates which of the two alternatives matched. This is a more
-       efficient way of obtaining this information than putting each  alterna-
+       ple it indicates which of the two alternatives matched. This is a  more
+       efficient  way of obtaining this information than putting each alterna-
        tive in its own capturing parentheses.


        If (*MARK) is encountered in a positive assertion, its name is recorded
        and passed back if it is the last-encountered. This does not happen for
        negative assertions.


-       After  a  partial match or a failed match, the name of the last encoun-
+       After a partial match or a failed match, the name of the  last  encoun-
        tered (*MARK) in the entire match process is returned. For example:


            re> /X(*MARK:A)Y|X(*MARK:B)Z/K
          data> XP
          No match, mark = B


-       Note that in this unanchored example the  mark  is  retained  from  the
+       Note  that  in  this  unanchored  example the mark is retained from the
        match attempt that started at the letter "X" in the subject. Subsequent
        match attempts starting at "P" and then with an empty string do not get
        as far as the (*MARK) item, but nevertheless do not reset it.


-       If  you  are  interested  in  (*MARK)  values after failed matches, you
-       should probably set the PCRE_NO_START_OPTIMIZE option  (see  above)  to
+       If you are interested in  (*MARK)  values  after  failed  matches,  you
+       should  probably  set  the PCRE_NO_START_OPTIMIZE option (see above) to
        ensure that the match is always attempted.


    Verbs that act after backtracking


        The following verbs do nothing when they are encountered. Matching con-
-       tinues with what follows, but if there is no subsequent match,  causing
-       a  backtrack  to  the  verb, a failure is forced. That is, backtracking
-       cannot pass to the left of the verb. However, when one of  these  verbs
-       appears  inside  an atomic group, its effect is confined to that group,
-       because once the group has been matched, there is never any  backtrack-
-       ing  into  it.  In  this situation, backtracking can "jump back" to the
-       left of the entire atomic group. (Remember also, as stated above,  that
+       tinues  with what follows, but if there is no subsequent match, causing
+       a backtrack to the verb, a failure is  forced.  That  is,  backtracking
+       cannot  pass  to the left of the verb. However, when one of these verbs
+       appears inside an atomic group, its effect is confined to  that  group,
+       because  once the group has been matched, there is never any backtrack-
+       ing into it. In this situation, backtracking can  "jump  back"  to  the
+       left  of the entire atomic group. (Remember also, as stated above, that
        this localization also applies in subroutine calls and assertions.)


-       These  verbs  differ  in exactly what kind of failure occurs when back-
+       These verbs differ in exactly what kind of failure  occurs  when  back-
        tracking reaches them.


          (*COMMIT)


-       This verb, which may not be followed by a name, causes the whole  match
+       This  verb, which may not be followed by a name, causes the whole match
        to fail outright if the rest of the pattern does not match. Even if the
        pattern is unanchored, no further attempts to find a match by advancing
        the  starting  point  take  place.  Once  (*COMMIT)  has  been  passed,
-       pcre_exec() is committed to finding a match  at  the  current  starting
+       pcre_exec()  is  committed  to  finding a match at the current starting
        point, or not at all. For example:


          a+(*COMMIT)b


-       This  matches  "xxaab" but not "aacaab". It can be thought of as a kind
+       This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
        of dynamic anchor, or "I've started, so I must finish." The name of the
-       most  recently passed (*MARK) in the path is passed back when (*COMMIT)
+       most recently passed (*MARK) in the path is passed back when  (*COMMIT)
        forces a match failure.


-       Note that (*COMMIT) at the start of a pattern is not  the  same  as  an
-       anchor,  unless  PCRE's start-of-match optimizations are turned off, as
+       Note  that  (*COMMIT)  at  the start of a pattern is not the same as an
+       anchor, unless PCRE's start-of-match optimizations are turned  off,  as
        shown in this pcretest example:


            re> /(*COMMIT)abc/
@@ -6340,111 +6357,111 @@
          xyzabc\Y
          No match


-       PCRE knows that any match must start  with  "a",  so  the  optimization
-       skips  along the subject to "a" before running the first match attempt,
-       which succeeds. When the optimization is disabled by the \Y  escape  in
+       PCRE  knows  that  any  match  must start with "a", so the optimization
+       skips along the subject to "a" before running the first match  attempt,
+       which  succeeds.  When the optimization is disabled by the \Y escape in
        the second subject, the match starts at "x" and so the (*COMMIT) causes
        it to fail without trying any other starting points.


          (*PRUNE) or (*PRUNE:NAME)


-       This verb causes the match to fail at the current starting position  in
-       the  subject  if the rest of the pattern does not match. If the pattern
-       is unanchored, the normal "bumpalong"  advance  to  the  next  starting
-       character  then happens. Backtracking can occur as usual to the left of
-       (*PRUNE), before it is reached,  or  when  matching  to  the  right  of
-       (*PRUNE),  but  if  there is no match to the right, backtracking cannot
-       cross (*PRUNE). In simple cases, the use of (*PRUNE) is just an  alter-
-       native  to an atomic group or possessive quantifier, but there are some
+       This  verb causes the match to fail at the current starting position in
+       the subject if the rest of the pattern does not match. If  the  pattern
+       is  unanchored,  the  normal  "bumpalong"  advance to the next starting
+       character then happens. Backtracking can occur as usual to the left  of
+       (*PRUNE),  before  it  is  reached,  or  when  matching to the right of
+       (*PRUNE), but if there is no match to the  right,  backtracking  cannot
+       cross  (*PRUNE). In simple cases, the use of (*PRUNE) is just an alter-
+       native to an atomic group or possessive quantifier, but there are  some
        uses of (*PRUNE) that cannot be expressed in any other way.  The behav-
-       iour  of  (*PRUNE:NAME)  is  the  same  as  (*MARK:NAME)(*PRUNE). In an
+       iour of (*PRUNE:NAME)  is  the  same  as  (*MARK:NAME)(*PRUNE).  In  an
        anchored pattern (*PRUNE) has the same effect as (*COMMIT).


          (*SKIP)


-       This verb, when given without a name, is like (*PRUNE), except that  if
-       the  pattern  is unanchored, the "bumpalong" advance is not to the next
+       This  verb, when given without a name, is like (*PRUNE), except that if
+       the pattern is unanchored, the "bumpalong" advance is not to  the  next
        character, but to the position in the subject where (*SKIP) was encoun-
-       tered.  (*SKIP)  signifies that whatever text was matched leading up to
+       tered. (*SKIP) signifies that whatever text was matched leading  up  to
        it cannot be part of a successful match. Consider:


          a+(*SKIP)b


-       If the subject is "aaaac...",  after  the  first  match  attempt  fails
-       (starting  at  the  first  character in the string), the starting point
+       If  the  subject  is  "aaaac...",  after  the first match attempt fails
+       (starting at the first character in the  string),  the  starting  point
        skips on to start the next attempt at "c". Note that a possessive quan-
-       tifer  does not have the same effect as this example; although it would
-       suppress backtracking  during  the  first  match  attempt,  the  second
-       attempt  would  start at the second character instead of skipping on to
+       tifer does not have the same effect as this example; although it  would
+       suppress  backtracking  during  the  first  match  attempt,  the second
+       attempt would start at the second character instead of skipping  on  to
        "c".


          (*SKIP:NAME)


-       When (*SKIP) has an associated name, its behaviour is modified. If  the
+       When  (*SKIP) has an associated name, its behaviour is modified. If the
        following pattern fails to match, the previous path through the pattern
-       is searched for the most recent (*MARK) that has the same name. If  one
-       is  found, the "bumpalong" advance is to the subject position that cor-
-       responds to that (*MARK) instead of to where (*SKIP)  was  encountered.
+       is  searched for the most recent (*MARK) that has the same name. If one
+       is found, the "bumpalong" advance is to the subject position that  cor-
+       responds  to  that (*MARK) instead of to where (*SKIP) was encountered.
        If no (*MARK) with a matching name is found, the (*SKIP) is ignored.


          (*THEN) or (*THEN:NAME)


-       This  verb  causes a skip to the next innermost alternative if the rest
-       of the pattern does not match. That is, it cancels  pending  backtrack-
-       ing,  but  only within the current alternative. Its name comes from the
+       This verb causes a skip to the next innermost alternative if  the  rest
+       of  the  pattern does not match. That is, it cancels pending backtrack-
+       ing, but only within the current alternative. Its name comes  from  the
        observation that it can be used for a pattern-based if-then-else block:


          ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...


-       If the COND1 pattern matches, FOO is tried (and possibly further  items
-       after  the  end  of the group if FOO succeeds); on failure, the matcher
-       skips to the second alternative and tries COND2,  without  backtracking
-       into  COND1.  The  behaviour  of  (*THEN:NAME)  is  exactly the same as
-       (*MARK:NAME)(*THEN).  If (*THEN) is not inside an alternation, it  acts
+       If  the COND1 pattern matches, FOO is tried (and possibly further items
+       after the end of the group if FOO succeeds); on  failure,  the  matcher
+       skips  to  the second alternative and tries COND2, without backtracking
+       into COND1. The behaviour  of  (*THEN:NAME)  is  exactly  the  same  as
+       (*MARK:NAME)(*THEN).   If (*THEN) is not inside an alternation, it acts
        like (*PRUNE).


-       Note  that  a  subpattern that does not contain a | character is just a
-       part of the enclosing alternative; it is not a nested alternation  with
-       only  one alternative. The effect of (*THEN) extends beyond such a sub-
-       pattern to the enclosing alternative. Consider this pattern,  where  A,
+       Note that a subpattern that does not contain a | character  is  just  a
+       part  of the enclosing alternative; it is not a nested alternation with
+       only one alternative. The effect of (*THEN) extends beyond such a  sub-
+       pattern  to  the enclosing alternative. Consider this pattern, where A,
        B, etc. are complex pattern fragments that do not contain any | charac-
        ters at this level:


          A (B(*THEN)C) | D


-       If A and B are matched, but there is a failure in C, matching does  not
+       If  A and B are matched, but there is a failure in C, matching does not
        backtrack into A; instead it moves to the next alternative, that is, D.
-       However, if the subpattern containing (*THEN) is given an  alternative,
+       However,  if the subpattern containing (*THEN) is given an alternative,
        it behaves differently:


          A (B(*THEN)C | (*FAIL)) | D


-       The  effect of (*THEN) is now confined to the inner subpattern. After a
+       The effect of (*THEN) is now confined to the inner subpattern. After  a
        failure in C, matching moves to (*FAIL), which causes the whole subpat-
-       tern  to  fail  because  there are no more alternatives to try. In this
+       tern to fail because there are no more alternatives  to  try.  In  this
        case, matching does now backtrack into A.


        Note also that a conditional subpattern is not considered as having two
-       alternatives,  because  only  one  is  ever used. In other words, the |
+       alternatives, because only one is ever used.  In  other  words,  the  |
        character in a conditional subpattern has a different meaning. Ignoring
        white space, consider:


          ^.*? (?(?=a) a | b(*THEN)c )


-       If  the  subject  is  "ba", this pattern does not match. Because .*? is
-       ungreedy, it initially matches zero  characters.  The  condition  (?=a)
-       then  fails,  the  character  "b"  is  matched, but "c" is not. At this
-       point, matching does not backtrack to .*? as might perhaps be  expected
-       from  the  presence  of  the | character. The conditional subpattern is
+       If the subject is "ba", this pattern does not  match.  Because  .*?  is
+       ungreedy,  it  initially  matches  zero characters. The condition (?=a)
+       then fails, the character "b" is matched,  but  "c"  is  not.  At  this
+       point,  matching does not backtrack to .*? as might perhaps be expected
+       from the presence of the | character.  The  conditional  subpattern  is
        part of the single alternative that comprises the whole pattern, and so
-       the  match  fails.  (If  there was a backtrack into .*?, allowing it to
+       the match fails. (If there was a backtrack into  .*?,  allowing  it  to
        match "b", the match would succeed.)


-       The verbs just described provide four different "strengths" of  control
+       The  verbs just described provide four different "strengths" of control
        when subsequent matching fails. (*THEN) is the weakest, carrying on the
-       match at the next alternative. (*PRUNE) comes next, failing  the  match
-       at  the  current starting position, but allowing an advance to the next
-       character (for an unanchored pattern). (*SKIP) is similar, except  that
+       match  at  the next alternative. (*PRUNE) comes next, failing the match
+       at the current starting position, but allowing an advance to  the  next
+       character  (for an unanchored pattern). (*SKIP) is similar, except that
        the advance may be more than one character. (*COMMIT) is the strongest,
        causing the entire match to fail.


@@ -6454,15 +6471,15 @@

          (A(*COMMIT)B(*THEN)C|D)


-       Once A has matched, PCRE is committed to this  match,  at  the  current
-       starting  position. If subsequently B matches, but C does not, the nor-
+       Once  A  has  matched,  PCRE is committed to this match, at the current
+       starting position. If subsequently B matches, but C does not, the  nor-
        mal (*THEN) action of trying the next alternative (that is, D) does not
        happen because (*COMMIT) overrides.



SEE ALSO

-       pcreapi(3),  pcrecallout(3),  pcrematching(3),  pcresyntax(3), pcre(3),
+       pcreapi(3), pcrecallout(3),  pcrematching(3),  pcresyntax(3),  pcre(3),
        pcre16(3).



@@ -6475,11 +6492,11 @@

REVISION

-       Last updated: 14 April 2012
+       Last updated: 01 June 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRESYNTAX(3)                                                    PCRESYNTAX(3)



@@ -6505,7 +6522,7 @@
          \a         alarm, that is, the BEL character (hex 07)
          \cx        "control-x", where x is any ASCII character
          \e         escape (hex 1B)
-         \f         formfeed (hex 0C)
+         \f         form feed (hex 0C)
          \n         newline (hex 0A)
          \r         carriage return (hex 0D)
          \t         tab (hex 09)
@@ -6521,16 +6538,16 @@
          \C         one data unit, even in UTF mode (best avoided)
          \d         a decimal digit
          \D         a character that is not a decimal digit
-         \h         a horizontal whitespace character
-         \H         a character that is not a horizontal whitespace character
+         \h         a horizontal white space character
+         \H         a character that is not a horizontal white space character
          \N         a character that is not a newline
          \p{xx}     a character with the xx property
          \P{xx}     a character without the xx property
          \R         a newline sequence
-         \s         a whitespace character
-         \S         a character that is not a whitespace character
-         \v         a vertical whitespace character
-         \V         a character that is not a vertical whitespace character
+         \s         a white space character
+         \S         a character that is not a white space character
+         \v         a vertical white space character
+         \V         a character that is not a vertical white space character
          \w         a "word" character
          \W         a "non-word" character
          \X         an extended Unicode sequence
@@ -6634,7 +6651,7 @@
          lower       lower case letter
          print       printing, including space
          punct       printing, excluding alphanumeric
-         space       whitespace
+         space       white space
          upper       upper case letter
          word        same as \w
          xdigit      hexadecimal digit
@@ -6856,8 +6873,8 @@
        Last updated: 10 January 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREUNICODE(3)                                                  PCREUNICODE(3)



@@ -6935,7 +6952,7 @@

        If an invalid UTF-8 string is passed to PCRE, an error return is given.
        At compile time, the only additional information is the offset  to  the
-       first  byte of the failing character. The runtime functions pcre_exec()
+       first byte of the failing character. The run-time functions pcre_exec()
        and pcre_dfa_exec() also pass back this information, as well as a  more
        detailed  reason  code if the caller has provided memory in which to do
        this.
@@ -6976,7 +6993,7 @@


        If an invalid UTF-16 string is passed  to  PCRE,  an  error  return  is
        given.  At  compile time, the only additional information is the offset
-       to the first data unit of the failing character. The runtime  functions
+       to the first data unit of the failing character. The run-time functions
        pcre16_exec() and pcre16_dfa_exec() also pass back this information, as
        well as a more detailed reason code if the caller has  provided  memory
        in which to do this.
@@ -7030,7 +7047,7 @@
        7.  Similarly,  characters that match the POSIX named character classes
        are all low-valued characters, unless the PCRE_UCP option is set.


-       8. However, the horizontal and  vertical  whitespace  matching  escapes
+       8. However, the horizontal and vertical white  space  matching  escapes
        (\h,  \H,  \v, and \V) do match all the appropriate Unicode characters,
        whether or not PCRE_UCP is set.


@@ -7057,8 +7074,8 @@
        Last updated: 14 April 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREJIT(3)                                                          PCREJIT(3)



@@ -7209,10 +7226,8 @@

          \C             match a single byte; not supported in UTF-8 mode
          (?Cn)          callouts
-         (*COMMIT)      )
-         (*MARK)        )
-         (*PRUNE)       ) the backtracking control verbs
-         (*SKIP)        )
+         (*PRUNE)       )
+         (*SKIP)        ) backtracking control verbs
          (*THEN)        )


        Support for some of these may be added in future.
@@ -7441,11 +7456,11 @@


REVISION

-       Last updated: 14 April 2012
+       Last updated: 04 May 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPARTIAL(3)                                                  PCREPARTIAL(3)



@@ -7894,8 +7909,8 @@
        Last updated: 24 February 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)



@@ -8029,8 +8044,8 @@
        Last updated: 10 January 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPERFORM(3)                                                  PCREPERFORM(3)



@@ -8199,8 +8214,8 @@
        Last updated: 09 January 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCREPOSIX(3)                                                      PCREPOSIX(3)



@@ -8463,8 +8478,8 @@
        Last updated: 09 January 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRECPP(3)                                                          PCRECPP(3)



@@ -8641,7 +8656,7 @@
           PCRE_DOTALL           dot matches newlines        /s
           PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
           PCRE_EXTRA            strict escape parsing       N/A
-          PCRE_EXTENDED         ignore whitespaces          /x
+          PCRE_EXTENDED         ignore white spaces         /x
           PCRE_UTF8             handles UTF8 chars          built-in
           PCRE_UNGREEDY         reverses * and *?           N/A
           PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
@@ -8805,8 +8820,8 @@


        Last updated: 08 January 2012
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRESAMPLE(3)                                                    PCRESAMPLE(3)



@@ -8929,6 +8944,10 @@
        The maximum length of name for a named subpattern is 32 characters, and
        the maximum number of named subpatterns is 10000.


+       The maximum length of a  name  in  a  (*MARK),  (*PRUNE),  (*SKIP),  or
+       (*THEN)  verb  is  255  for  the 8-bit library and 65535 for the 16-bit
+       library.
+
        The maximum length of a subject string is the largest  positive  number
        that  an integer variable can hold. However, when using the traditional
        matching function, PCRE uses recursion to handle subpatterns and indef-
@@ -8946,11 +8965,11 @@


REVISION

-       Last updated: 08 January 2012
+       Last updated: 04 May 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+
 PCRESTACK(3)                                                      PCRESTACK(3)



@@ -9134,5 +9153,5 @@
        Last updated: 21 January 2012
        Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------
- 
- 
+
+


Modified: code/trunk/doc/pcre16.3
===================================================================
--- code/trunk/doc/pcre16.3    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcre16.3    2012-06-02 11:03:06 UTC (rev 975)
@@ -264,7 +264,7 @@
 .sp
 There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK,
 which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
-fact, these new options define the same bits in the options word. There is a 
+fact, these new options define the same bits in the options word. There is a
 discussion about the
 .\" HTML <a href="pcreunicode.html#utf16strings">
 .\" </a>
@@ -274,7 +274,7 @@
 .\" HREF
 \fBpcreunicode\fP
 .\"
-page. 
+page.
 .P
 For the \fBpcre16_config()\fP function there is an option PCRE_CONFIG_UTF16
 that returns 1 if UTF-16 support is configured, otherwise 0. If this option is


Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcreapi.3    2012-06-02 11:03:06 UTC (rev 975)
@@ -926,7 +926,7 @@
   72  too many forward references
   73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
   74  invalid UTF-16 string (specifically UTF-16)
-  75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) 
+  75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
 .sp
 The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
 be used if the limits were changed when PCRE was built.
@@ -964,12 +964,12 @@
 \fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block.
 .P
 The second argument of \fBpcre_study()\fP contains option bits. There are three
-options: 
+options:
 .sp
   PCRE_STUDY_JIT_COMPILE
   PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
   PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
-.sp  
+.sp
 If any of these are set, and the just-in-time compiler is available, the
 pattern is further compiled into machine code that executes much faster than
 the \fBpcre_exec()\fP interpretive matching function. If the just-in-time
@@ -1240,7 +1240,7 @@
 .sp
 Return the number of characters (NB not bytes) in the longest lookbehind
 assertion in the pattern. Note that the simple assertions \eb and \eB require a
-one-character lookbehind. This information is useful when doing multi-segment 
+one-character lookbehind. This information is useful when doing multi-segment
 matching using the partial matching facilities.
 .sp
   PCRE_INFO_MINLENGTH
@@ -1524,7 +1524,7 @@
 Limiting the recursion depth limits the amount of machine stack that can be
 used, or, when PCRE has been compiled to use memory on the heap instead of the
 stack, the amount of heap memory that can be used. This limit is not relevant,
-and is ignored, when matching is done using JIT compiled code. 
+and is ignored, when matching is done using JIT compiled code.
 .P
 The default value for \fImatch_limit_recursion\fP can be set when PCRE is
 built; the default default is the same value as the default for
@@ -1708,7 +1708,7 @@
 "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
 are considered at every possible starting position in the subject string. If
 PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
-time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set, 
+time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
 matching is always done using interpretively.
 .P
 Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
@@ -2639,9 +2639,9 @@
   PCRE_ERROR_DFA_BADRESTART (-30)
 .sp
 When \fBpcre_dfa_exec()\fP is called with the \fBPCRE_DFA_RESTART\fP option,
-some plausibility checks are made on the contents of the workspace, which 
-should contain data about the previous partial match. If any of these checks 
-fail, this error is given.   
+some plausibility checks are made on the contents of the workspace, which
+should contain data about the previous partial match. If any of these checks
+fail, this error is given.
 .
 .
 .SH "SEE ALSO"


Modified: code/trunk/doc/pcregrep.1
===================================================================
--- code/trunk/doc/pcregrep.1    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcregrep.1    2012-06-02 11:03:06 UTC (rev 975)
@@ -98,7 +98,7 @@
 .SH "BINARY FILES"
 .rs
 .sp
-By default, a file that contains a binary zero byte within the first 1024 bytes 
+By default, a file that contains a binary zero byte within the first 1024 bytes
 is identified as a binary file, and is processed specially. (GNU grep also
 identifies binary files in this manner.) See the \fB--binary-files\fP option
 for a means of changing the way binary files are handled.
@@ -139,7 +139,7 @@
 guarantees to have up to 8K of preceding text available for context output.
 .TP
 \fB--binary-files=\fP\fIword\fP
-Specify how binary files are to be processed. If the word is "binary" (the 
+Specify how binary files are to be processed. If the word is "binary" (the
 default), pattern matching is performed on binary files, but the only output is
 "Binary file <name> matches" when a match succeeds. If the word is "text",
 which is equivalent to the \fB-a\fP or \fB--text\fP option, binary files are


Modified: code/trunk/doc/pcrejit.3
===================================================================
--- code/trunk/doc/pcrejit.3    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcrejit.3    2012-06-02 11:03:06 UTC (rev 975)
@@ -82,7 +82,7 @@
       pcre_free(study_ptr);
   #endif
 .sp
-PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete 
+PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete
 matches. If you want to run partial matches using the PCRE_PARTIAL_HARD or
 PCRE_PARTIAL_SOFT options of \fBpcre_exec()\fP, you should set one or both of
 the following options in addition to, or instead of, PCRE_STUDY_JIT_COMPILE
@@ -108,7 +108,7 @@
 no JIT data is created. Otherwise, the compiled pattern is passed to the JIT
 compiler, which turns it into machine code that executes much faster than the
 normal interpretive code. When \fBpcre_exec()\fP is passed a \fBpcre_extra\fP
-block containing a pointer to JIT code of the appropriate mode (normal or 
+block containing a pointer to JIT code of the appropriate mode (normal or
 hard/soft partial), it obeys that code instead of running the interpreter. The
 result is identical, but the compiled JIT code runs much faster.
 .P
@@ -149,7 +149,7 @@
 .sp
   \eC             match a single byte; not supported in UTF-8 mode
   (?Cn)          callouts
-  (*PRUNE)       ) 
+  (*PRUNE)       )
   (*SKIP)        ) backtracking control verbs
   (*THEN)        )
 .sp
@@ -239,24 +239,24 @@
   (2) If \fIcallback\fP is NULL and \fIdata\fP is not NULL, \fIdata\fP must be
       a valid JIT stack, the result of calling \fBpcre_jit_stack_alloc()\fP.
 .sp
-  (3) If \fIcallback\fP is not NULL, it must point to a function that is 
-      called with \fIdata\fP as an argument at the start of matching, in 
-      order to set up a JIT stack. If the return from the callback 
-      function is NULL, the internal 32K stack is used; otherwise the 
-      return value must be a valid JIT stack, the result of calling 
+  (3) If \fIcallback\fP is not NULL, it must point to a function that is
+      called with \fIdata\fP as an argument at the start of matching, in
+      order to set up a JIT stack. If the return from the callback
+      function is NULL, the internal 32K stack is used; otherwise the
+      return value must be a valid JIT stack, the result of calling
       \fBpcre_jit_stack_alloc()\fP.
 .sp
-A callback function is obeyed whenever JIT code is about to be run; it is not 
-obeyed when \fBpcre_exec()\fP is called with options that are incompatible for 
+A callback function is obeyed whenever JIT code is about to be run; it is not
+obeyed when \fBpcre_exec()\fP is called with options that are incompatible for
 JIT execution. A callback function can therefore be used to determine whether a
 match operation was executed by JIT or by the interpreter.
 .P
 You may safely use the same JIT stack for more than one pattern (either by
 assigning directly or by callback), as long as the patterns are all matched
 sequentially in the same thread. In a multithread application, if you do not
-specify a JIT stack, or if you assign or pass back NULL from a callback, that 
-is thread-safe, because each thread has its own machine stack. However, if you 
-assign or pass back a non-NULL JIT stack, this must be a different stack for 
+specify a JIT stack, or if you assign or pass back NULL from a callback, that
+is thread-safe, because each thread has its own machine stack. However, if you
+assign or pass back a non-NULL JIT stack, this must be a different stack for
 each thread so that the application is thread-safe.
 .P
 Strictly speaking, even more is allowed. You can assign the same non-NULL stack


Modified: code/trunk/doc/pcrepartial.3
===================================================================
--- code/trunk/doc/pcrepartial.3    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcrepartial.3    2012-06-02 11:03:06 UTC (rev 975)
@@ -32,14 +32,14 @@
 the details differ between the two types of matching function. If both options
 are set, PCRE_PARTIAL_HARD takes precedence.
 .P
-If you want to use partial matching with just-in-time optimized code, you must 
+If you want to use partial matching with just-in-time optimized code, you must
 call \fBpcre_study()\fP or \fBpcre16_study()\fP with one or both of these
 options:
 .sp
   PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
   PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
 .sp
-PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial 
+PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial
 matches on the same pattern. If the appropriate JIT study mode has not been set
 for a match, the interpretive matching code is used.
 .P
@@ -328,8 +328,8 @@
 .P
 2. Lookbehind assertions that have already been obeyed are catered for in the
 offsets that are returned for a partial match. However a lookbehind assertion
-later in the pattern could require even earlier characters to be inspected. You 
-can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the 
+later in the pattern could require even earlier characters to be inspected. You
+can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
 \fBpcre_fullinfo()\fP or \fBpcre16_fullinfo()\fP functions to obtain the length
 of the largest lookbehind in the pattern. This length is given in characters,
 not bytes. If you always retain at least that many characters before the
@@ -345,7 +345,7 @@
   data> ab\eP
   No match
 .sp
-If the next segment begins "cx", a match should be found, but this will only 
+If the next segment begins "cx", a match should be found, but this will only
 happen if characters from the previous segment are retained. For this reason, a
 "no match" result should be interpreted as "partial match of an empty string"
 when the pattern contains lookbehinds.


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcrepattern.3    2012-06-02 11:03:06 UTC (rev 975)
@@ -2633,7 +2633,7 @@
 .\" HREF
 \fBpcreapi\fP
 .\"
-documentation. 
+documentation.
 .P
 Experiments with Perl suggest that it too has similar optimizations, sometimes
 leading to anomalous results.
@@ -2727,10 +2727,10 @@
 (*MARK) item, but nevertheless do not reset it.
 .P
 If you are interested in (*MARK) values after failed matches, you should
-probably set the PCRE_NO_START_OPTIMIZE option 
+probably set the PCRE_NO_START_OPTIMIZE option
 .\" HTML <a href="#nooptimize">
 .\" </a>
-(see above) 
+(see above)
 .\"
 to ensure that the match is always attempted.
 .


Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcretest.1    2012-06-02 11:03:06 UTC (rev 975)
@@ -131,8 +131,8 @@
 Behave as if each pattern has the \fB/S\fP modifier; in other words, force each
 pattern to be studied. If \fB-s+\fP is used, all the JIT compile options are
 passed to \fBpcre[16]_study()\fP, causing just-in-time optimization to be set
-up if it is available, for both full and partial matching. Specific JIT compile 
-options can be selected by following \fB-s+\fP with a digit in the range 1 to 
+up if it is available, for both full and partial matching. Specific JIT compile
+options can be selected by following \fB-s+\fP with a digit in the range 1 to
 7, which selects the JIT compile modes as follows:
 .sp
   1  normal match only
@@ -141,7 +141,7 @@
   4  hard partial match only
   6  soft and hard partial match
   7  all three modes (default)
-.sp        
+.sp
 If \fB-s++\fP is used instead of \fB-s+\fP (with or without a following digit),
 the text "(JIT)" is added to the first output line after a match or no match
 when JIT-compiled code was actually used.
@@ -402,7 +402,7 @@
 If the \fB/S\fP modifier is immediately followed by a + character, the call to
 \fBpcre[16]_study()\fP is made with all the JIT study options, requesting
 just-in-time optimization support if it is available, for both normal and
-partial matching. If you want to restrict the JIT compiling modes, you can 
+partial matching. If you want to restrict the JIT compiling modes, you can
 follow \fB/S+\fP with a digit in the range 1 to 7:
 .sp
   1  normal match only
@@ -411,13 +411,13 @@
   4  hard partial match only
   6  soft and hard partial match
   7  all three modes (default)
-.sp        
+.sp
 If \fB/S++\fP is used instead of \fB/S+\fP (with or without a following digit),
 the text "(JIT)" is added to the first output line after a match or no match
 when JIT-compiled code was actually used.
 .P
 Note that there is also an independent \fB/+\fP modifier; it must not be given
-immediately after \fB/S\fP or \fB/S+\fP because this will be misinterpreted. 
+immediately after \fB/S\fP or \fB/S+\fP because this will be misinterpreted.
 .P
 If JIT studying is successful, the compiled JIT code will automatically be used
 when \fBpcre[16]_exec()\fP is run, except when incompatible run-time options


Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_compile.c    2012-06-02 11:03:06 UTC (rev 975)
@@ -490,7 +490,7 @@
   "disallowed Unicode code point (>= 0xd800 && <= 0xdfff)\0"
   "invalid UTF-16 string\0"
   /* 75 */
-  "name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"  
+  "name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"
   ;


 /* Table to identify digits and hex digits. This is used when compiling
@@ -4518,7 +4518,7 @@
       LONE_SINGLE_CHARACTER:


       /* Only the value of 1 matters for class_single_char. */
-       
+
       if (class_single_char < 2) class_single_char++;


       /* If class_charcount is 1, we saw precisely one character. As long as
@@ -4813,7 +4813,7 @@
     if (*previous == OP_CHAR || *previous == OP_CHARI
         || *previous == OP_NOT || *previous == OP_NOTI)
       {
-      switch (*previous) 
+      switch (*previous)
         {
         default: /* Make compiler happy. */
         case OP_CHAR:  op_type = OP_STAR - OP_STAR; break;
@@ -5593,7 +5593,7 @@
       ptr++;
       while (MAX_255(*ptr) && (cd->ctypes[*ptr] & ctype_letter) != 0) ptr++;
       namelen = (int)(ptr - name);
-      
+
       /* It appears that Perl allows any characters whatsoever, other than
       a closing parenthesis, to appear in arguments, so we no longer insist on
       letters, digits, and underscores. */
@@ -5607,7 +5607,7 @@
           {
           *errorcodeptr = ERR75;
           goto FAILED;
-          }     
+          }
         }


       if (*ptr != CHAR_RIGHT_PARENTHESIS)
@@ -6859,13 +6859,13 @@
       /* For the rest (including \X when Unicode properties are supported), we
       can obtain the OP value by negating the escape value in the default
       situation when PCRE_UCP is not set. When it *is* set, we substitute
-      Unicode property tests. Note that \b and \B do a one-character 
+      Unicode property tests. Note that \b and \B do a one-character
       lookbehind. */


       else
         {
         if ((-c == ESC_b || -c == ESC_B) && cd->max_lookbehind == 0)
-          cd->max_lookbehind = 1; 
+          cd->max_lookbehind = 1;
 #ifdef SUPPORT_UCP
         if (-c >= ESC_DU && -c <= ESC_wu)
           {
@@ -7173,11 +7173,11 @@
         *ptrptr = ptr;
         return FALSE;
         }
-      else 
-        { 
-        if (fixed_length > cd->max_lookbehind) 
-          cd->max_lookbehind = fixed_length; 
-        PUT(reverse_count, 0, fixed_length); 
+      else
+        {
+        if (fixed_length > cd->max_lookbehind)
+          cd->max_lookbehind = fixed_length;
+        PUT(reverse_count, 0, fixed_length);
         }
       }
     }


Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_dfa_exec.c    2012-06-02 11:03:06 UTC (rev 975)
@@ -573,10 +573,10 @@
   int clen, dlen;
   unsigned int c, d;
   int forced_fail = 0;
-  BOOL partial_newline = FALSE; 
+  BOOL partial_newline = FALSE;
   BOOL could_continue = reset_could_continue;
-  reset_could_continue = FALSE; 
-  
+  reset_could_continue = FALSE;
+
   /* Make the new state list into the active state list and empty the
   new state list. */


@@ -645,7 +645,7 @@

     /* A negative offset is a special case meaning "hold off going to this
     (negated) state until the number of characters in the data field have
-    been skipped". If the could_continue flag was passed over from a previous 
+    been skipped". If the could_continue flag was passed over from a previous
     state, arrange for it to passed on. */


     if (state_offset < 0)
@@ -695,7 +695,7 @@
     permitted.


     We also use this mechanism for opcodes such as OP_TYPEPLUS that take an
-    argument that is not a data character - but is always one byte long because 
+    argument that is not a data character - but is always one byte long because
     the values are small. We have to take special action to deal with  \P, \p,
     \H, \h, \V, \v and \X in this case. To keep the other cases fast, convert
     these ones to new opcodes. */
@@ -894,19 +894,19 @@
       /*-----------------------------------------------------------------*/
       case OP_ANY:
       if (clen > 0 && !IS_NEWLINE(ptr))
-        { 
+        {
         if (ptr + 1 >= md->end_subject &&
             (md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
             NLBLOCK->nltype == NLTYPE_FIXED &&
-            NLBLOCK->nllen == 2 && 
+            NLBLOCK->nllen == 2 &&
             c == NLBLOCK->nl[0])
           {
-          could_continue = partial_newline = TRUE;          
-          } 
+          could_continue = partial_newline = TRUE;
+          }
         else
-          { 
-          ADD_NEW(state_offset + 1, 0); 
-          } 
+          {
+          ADD_NEW(state_offset + 1, 0);
+          }
         }
       break;


@@ -938,16 +938,16 @@
         else if (ptr + 1 >= md->end_subject &&
                  (md->moptions & (PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT)) != 0 &&
                  NLBLOCK->nltype == NLTYPE_FIXED &&
-                 NLBLOCK->nllen == 2 && 
+                 NLBLOCK->nllen == 2 &&
                  c == NLBLOCK->nl[0])
           {
           if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
             {
             reset_could_continue = TRUE;
-            ADD_NEW_DATA(-(state_offset + 1), 0, 1);  
-            }  
-          else could_continue = partial_newline = TRUE; 
-          } 
+            ADD_NEW_DATA(-(state_offset + 1), 0, 1);
+            }
+          else could_continue = partial_newline = TRUE;
+          }
         }
       break;


@@ -963,16 +963,16 @@
         else if (ptr + 1 >= md->end_subject &&
                  (md->moptions & (PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT)) != 0 &&
                  NLBLOCK->nltype == NLTYPE_FIXED &&
-                 NLBLOCK->nllen == 2 && 
+                 NLBLOCK->nllen == 2 &&
                  c == NLBLOCK->nl[0])
           {
           if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
             {
             reset_could_continue = TRUE;
-            ADD_NEW_DATA(-(state_offset + 1), 0, 1);  
-            }  
-          else could_continue = partial_newline = TRUE; 
-          } 
+            ADD_NEW_DATA(-(state_offset + 1), 0, 1);
+            }
+          else could_continue = partial_newline = TRUE;
+          }
         }
       else if (IS_NEWLINE(ptr))
         { ADD_ACTIVE(state_offset + 1, 0); }
@@ -1138,11 +1138,11 @@
         if (d == OP_ANY && ptr + 1 >= md->end_subject &&
             (md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
             NLBLOCK->nltype == NLTYPE_FIXED &&
-            NLBLOCK->nllen == 2 && 
+            NLBLOCK->nllen == 2 &&
             c == NLBLOCK->nl[0])
           {
-          could_continue = partial_newline = TRUE;          
-          } 
+          could_continue = partial_newline = TRUE;
+          }
         else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
             (c < 256 &&
               (d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1169,11 +1169,11 @@
         if (d == OP_ANY && ptr + 1 >= md->end_subject &&
             (md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
             NLBLOCK->nltype == NLTYPE_FIXED &&
-            NLBLOCK->nllen == 2 && 
+            NLBLOCK->nllen == 2 &&
             c == NLBLOCK->nl[0])
           {
-          could_continue = partial_newline = TRUE;          
-          } 
+          could_continue = partial_newline = TRUE;
+          }
         else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
             (c < 256 &&
               (d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1199,11 +1199,11 @@
         if (d == OP_ANY && ptr + 1 >= md->end_subject &&
             (md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
             NLBLOCK->nltype == NLTYPE_FIXED &&
-            NLBLOCK->nllen == 2 && 
+            NLBLOCK->nllen == 2 &&
             c == NLBLOCK->nl[0])
           {
-          could_continue = partial_newline = TRUE;          
-          } 
+          could_continue = partial_newline = TRUE;
+          }
         else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
             (c < 256 &&
               (d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1227,11 +1227,11 @@
         if (d == OP_ANY && ptr + 1 >= md->end_subject &&
             (md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
             NLBLOCK->nltype == NLTYPE_FIXED &&
-            NLBLOCK->nllen == 2 && 
+            NLBLOCK->nllen == 2 &&
             c == NLBLOCK->nl[0])
           {
-          could_continue = partial_newline = TRUE;          
-          } 
+          could_continue = partial_newline = TRUE;
+          }
         else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
             (c < 256 &&
               (d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1256,11 +1256,11 @@
         if (d == OP_ANY && ptr + 1 >= md->end_subject &&
             (md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
             NLBLOCK->nltype == NLTYPE_FIXED &&
-            NLBLOCK->nllen == 2 && 
+            NLBLOCK->nllen == 2 &&
             c == NLBLOCK->nl[0])
           {
-          could_continue = partial_newline = TRUE;          
-          } 
+          could_continue = partial_newline = TRUE;
+          }
         else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
             (c < 256 &&
               (d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1909,8 +1909,8 @@
           ncount++;
           nptr += ndlen;
           }
-        if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0) 
-            reset_could_continue = TRUE; 
+        if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0)
+            reset_could_continue = TRUE;
         if (++count >= GET2(code, 1))
           { ADD_NEW_DATA(-(state_offset + 2 + IMM2_SIZE), 0, ncount); }
         else
@@ -2124,8 +2124,8 @@
           ncount++;
           nptr += nclen;
           }
-        if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0) 
-            reset_could_continue = TRUE; 
+        if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0)
+            reset_could_continue = TRUE;
         ADD_NEW_DATA(-(state_offset + 1), 0, ncount);
         }
       break;
@@ -2151,20 +2151,20 @@
         break;


         case 0x000d:
-        if (ptr + 1 >= end_subject) 
+        if (ptr + 1 >= end_subject)
           {
-          ADD_NEW(state_offset + 1, 0); 
-          if ((md->moptions & PCRE_PARTIAL_HARD) != 0) 
-            reset_could_continue = TRUE; 
-          }  
+          ADD_NEW(state_offset + 1, 0);
+          if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
+            reset_could_continue = TRUE;
+          }
         else if (ptr[1] == 0x0a)
           {
           ADD_NEW_DATA(-(state_offset + 1), 0, 1);
           }
         else
-          { 
+          {
           ADD_NEW(state_offset + 1, 0);
-          } 
+          }
         break;
         }
       break;
@@ -2277,7 +2277,7 @@


       case OP_NOTI:
       if (clen > 0)
-        { 
+        {
         unsigned int otherd;
 #ifdef SUPPORT_UTF
         if (utf && d >= 128)
@@ -2291,7 +2291,7 @@
         otherd = TABLE_GET(d, fcc, d);
         if (c != d && c != otherd)
           { ADD_NEW(state_offset + dlen + 1, 0); }
-        }   
+        }
       break;


       /*-----------------------------------------------------------------*/
@@ -3047,7 +3047,7 @@


   The "could_continue" variable is true if a state could have continued but
   for the fact that the end of the subject was reached. */
-  
+
   if (new_count <= 0)
     {
     if (rlevel == 1 &&                               /* Top level, and */
@@ -3064,8 +3064,8 @@
           (                                          /* or ... */
           ptr >= end_subject &&                /* End of subject and */
           ptr > md->start_used_ptr)            /* Inspected non-empty string */
-          ) 
-        )   
+          )
+        )
       {
       if (offsetcount >= 2)
         {
@@ -3172,15 +3172,15 @@
     PCRE_ERROR_BADENDIANNESS:PCRE_ERROR_BADMAGIC;
 if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE;


-/* If restarting after a partial match, do some sanity checks on the contents
+/* If restarting after a partial match, do some sanity checks on the contents
of the workspace. */

 if ((options & PCRE_DFA_RESTART) != 0)
   {
-  if ((workspace[0] & (-2)) != 0 || workspace[1] < 1 || 
+  if ((workspace[0] & (-2)) != 0 || workspace[1] < 1 ||
     workspace[1] > (wscount - 2)/INTS_PER_STATEBLOCK)
-      return PCRE_ERROR_DFA_BADRESTART; 
-  } 
+      return PCRE_ERROR_DFA_BADRESTART;
+  }


/* Set up study, callout, and table data */


Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_exec.c    2012-06-02 11:03:06 UTC (rev 975)
@@ -1577,9 +1577,9 @@
         }
       md->mark = save_mark;


-      /* A COMMIT failure must fail the entire assertion, without trying any 
+      /* A COMMIT failure must fail the entire assertion, without trying any
       subsequent branches. */
-    
+
       if (rrc == MATCH_COMMIT) RRETURN(MATCH_NOMATCH);


       /* PCRE does not allow THEN to escape beyond an assertion; it


Modified: code/trunk/pcre_fullinfo.c
===================================================================
--- code/trunk/pcre_fullinfo.c    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_fullinfo.c    2012-06-02 11:03:06 UTC (rev 975)
@@ -192,10 +192,10 @@
   case PCRE_INFO_HASCRORLF:
   *((int *)where) = (re->flags & PCRE_HASCRORLF) != 0;
   break;
-  
-  case PCRE_INFO_MAXLOOKBEHIND: 
+
+  case PCRE_INFO_MAXLOOKBEHIND:
   *((int *)where) = re->max_lookbehind;
-  break;  
+  break;


default: return PCRE_ERROR_BADOPTION;
}

Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_internal.h    2012-06-02 11:03:06 UTC (rev 975)
@@ -2137,7 +2137,7 @@
   const  pcre_uchar *once_target; /* Where to back up to for atomic groups */
 #ifdef NO_RECURSE
   void  *match_frames_base;       /* For remembering malloc'd frames */
-#endif      
+#endif
 } match_data;


/* A similar structure is used for the same purpose by the DFA matching

Modified: code/trunk/pcre_tables.c
===================================================================
--- code/trunk/pcre_tables.c    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_tables.c    2012-06-02 11:03:06 UTC (rev 975)
@@ -435,151 +435,151 @@
   STRING_Zs0;


const ucp_type_table PRIV(utt)[] = {
- { 0, PT_ANY, 0 },
- { 4, PT_SC, ucp_Arabic },
- { 11, PT_SC, ucp_Armenian },
- { 20, PT_SC, ucp_Avestan },
- { 28, PT_SC, ucp_Balinese },
- { 37, PT_SC, ucp_Bamum },
- { 43, PT_SC, ucp_Batak },
- { 49, PT_SC, ucp_Bengali },
- { 57, PT_SC, ucp_Bopomofo },
- { 66, PT_SC, ucp_Brahmi },
- { 73, PT_SC, ucp_Braille },
- { 81, PT_SC, ucp_Buginese },
- { 90, PT_SC, ucp_Buhid },
- { 96, PT_GC, ucp_C },
- { 98, PT_SC, ucp_Canadian_Aboriginal },
- { 118, PT_SC, ucp_Carian },
- { 125, PT_PC, ucp_Cc },
- { 128, PT_PC, ucp_Cf },
- { 131, PT_SC, ucp_Chakma },
- { 138, PT_SC, ucp_Cham },
- { 143, PT_SC, ucp_Cherokee },
- { 152, PT_PC, ucp_Cn },
- { 155, PT_PC, ucp_Co },
- { 158, PT_SC, ucp_Common },
- { 165, PT_SC, ucp_Coptic },
- { 172, PT_PC, ucp_Cs },
- { 175, PT_SC, ucp_Cuneiform },
- { 185, PT_SC, ucp_Cypriot },
- { 193, PT_SC, ucp_Cyrillic },
- { 202, PT_SC, ucp_Deseret },
- { 210, PT_SC, ucp_Devanagari },
- { 221, PT_SC, ucp_Egyptian_Hieroglyphs },
- { 242, PT_SC, ucp_Ethiopic },
- { 251, PT_SC, ucp_Georgian },
- { 260, PT_SC, ucp_Glagolitic },
- { 271, PT_SC, ucp_Gothic },
- { 278, PT_SC, ucp_Greek },
- { 284, PT_SC, ucp_Gujarati },
- { 293, PT_SC, ucp_Gurmukhi },
- { 302, PT_SC, ucp_Han },
- { 306, PT_SC, ucp_Hangul },
- { 313, PT_SC, ucp_Hanunoo },
- { 321, PT_SC, ucp_Hebrew },
- { 328, PT_SC, ucp_Hiragana },
- { 337, PT_SC, ucp_Imperial_Aramaic },
- { 354, PT_SC, ucp_Inherited },
- { 364, PT_SC, ucp_Inscriptional_Pahlavi },
- { 386, PT_SC, ucp_Inscriptional_Parthian },
- { 409, PT_SC, ucp_Javanese },
- { 418, PT_SC, ucp_Kaithi },
- { 425, PT_SC, ucp_Kannada },
- { 433, PT_SC, ucp_Katakana },
- { 442, PT_SC, ucp_Kayah_Li },
- { 451, PT_SC, ucp_Kharoshthi },
- { 462, PT_SC, ucp_Khmer },
- { 468, PT_GC, ucp_L },
- { 470, PT_LAMP, 0 },
- { 473, PT_SC, ucp_Lao },
- { 477, PT_SC, ucp_Latin },
- { 483, PT_SC, ucp_Lepcha },
- { 490, PT_SC, ucp_Limbu },
- { 496, PT_SC, ucp_Linear_B },
- { 505, PT_SC, ucp_Lisu },
- { 510, PT_PC, ucp_Ll },
- { 513, PT_PC, ucp_Lm },
- { 516, PT_PC, ucp_Lo },
- { 519, PT_PC, ucp_Lt },
- { 522, PT_PC, ucp_Lu },
- { 525, PT_SC, ucp_Lycian },
- { 532, PT_SC, ucp_Lydian },
- { 539, PT_GC, ucp_M },
- { 541, PT_SC, ucp_Malayalam },
- { 551, PT_SC, ucp_Mandaic },
- { 559, PT_PC, ucp_Mc },
- { 562, PT_PC, ucp_Me },
- { 565, PT_SC, ucp_Meetei_Mayek },
- { 578, PT_SC, ucp_Meroitic_Cursive },
- { 595, PT_SC, ucp_Meroitic_Hieroglyphs },
- { 616, PT_SC, ucp_Miao },
- { 621, PT_PC, ucp_Mn },
- { 624, PT_SC, ucp_Mongolian },
- { 634, PT_SC, ucp_Myanmar },
- { 642, PT_GC, ucp_N },
- { 644, PT_PC, ucp_Nd },
- { 647, PT_SC, ucp_New_Tai_Lue },
- { 659, PT_SC, ucp_Nko },
- { 663, PT_PC, ucp_Nl },
- { 666, PT_PC, ucp_No },
- { 669, PT_SC, ucp_Ogham },
- { 675, PT_SC, ucp_Ol_Chiki },
- { 684, PT_SC, ucp_Old_Italic },
- { 695, PT_SC, ucp_Old_Persian },
- { 707, PT_SC, ucp_Old_South_Arabian },
- { 725, PT_SC, ucp_Old_Turkic },
- { 736, PT_SC, ucp_Oriya },
- { 742, PT_SC, ucp_Osmanya },
- { 750, PT_GC, ucp_P },
- { 752, PT_PC, ucp_Pc },
- { 755, PT_PC, ucp_Pd },
- { 758, PT_PC, ucp_Pe },
- { 761, PT_PC, ucp_Pf },
- { 764, PT_SC, ucp_Phags_Pa },
- { 773, PT_SC, ucp_Phoenician },
- { 784, PT_PC, ucp_Pi },
- { 787, PT_PC, ucp_Po },
- { 790, PT_PC, ucp_Ps },
- { 793, PT_SC, ucp_Rejang },
- { 800, PT_SC, ucp_Runic },
- { 806, PT_GC, ucp_S },
- { 808, PT_SC, ucp_Samaritan },
- { 818, PT_SC, ucp_Saurashtra },
- { 829, PT_PC, ucp_Sc },
- { 832, PT_SC, ucp_Sharada },
- { 840, PT_SC, ucp_Shavian },
- { 848, PT_SC, ucp_Sinhala },
- { 856, PT_PC, ucp_Sk },
- { 859, PT_PC, ucp_Sm },
- { 862, PT_PC, ucp_So },
- { 865, PT_SC, ucp_Sora_Sompeng },
- { 878, PT_SC, ucp_Sundanese },
- { 888, PT_SC, ucp_Syloti_Nagri },
- { 901, PT_SC, ucp_Syriac },
- { 908, PT_SC, ucp_Tagalog },
- { 916, PT_SC, ucp_Tagbanwa },
- { 925, PT_SC, ucp_Tai_Le },
- { 932, PT_SC, ucp_Tai_Tham },
- { 941, PT_SC, ucp_Tai_Viet },
- { 950, PT_SC, ucp_Takri },
- { 956, PT_SC, ucp_Tamil },
- { 962, PT_SC, ucp_Telugu },
- { 969, PT_SC, ucp_Thaana },
- { 976, PT_SC, ucp_Thai },
- { 981, PT_SC, ucp_Tibetan },
- { 989, PT_SC, ucp_Tifinagh },
- { 998, PT_SC, ucp_Ugaritic },
- { 1007, PT_SC, ucp_Vai },
- { 1011, PT_ALNUM, 0 },
- { 1015, PT_PXSPACE, 0 },
- { 1019, PT_SPACE, 0 },
- { 1023, PT_WORD, 0 },
- { 1027, PT_SC, ucp_Yi },
- { 1030, PT_GC, ucp_Z },
- { 1032, PT_PC, ucp_Zl },
- { 1035, PT_PC, ucp_Zp },
- { 1038, PT_PC, ucp_Zs }
+ { 0, PT_ANY, 0 },
+ { 4, PT_SC, ucp_Arabic },
+ { 11, PT_SC, ucp_Armenian },
+ { 20, PT_SC, ucp_Avestan },
+ { 28, PT_SC, ucp_Balinese },
+ { 37, PT_SC, ucp_Bamum },
+ { 43, PT_SC, ucp_Batak },
+ { 49, PT_SC, ucp_Bengali },
+ { 57, PT_SC, ucp_Bopomofo },
+ { 66, PT_SC, ucp_Brahmi },
+ { 73, PT_SC, ucp_Braille },
+ { 81, PT_SC, ucp_Buginese },
+ { 90, PT_SC, ucp_Buhid },
+ { 96, PT_GC, ucp_C },
+ { 98, PT_SC, ucp_Canadian_Aboriginal },
+ { 118, PT_SC, ucp_Carian },
+ { 125, PT_PC, ucp_Cc },
+ { 128, PT_PC, ucp_Cf },
+ { 131, PT_SC, ucp_Chakma },
+ { 138, PT_SC, ucp_Cham },
+ { 143, PT_SC, ucp_Cherokee },
+ { 152, PT_PC, ucp_Cn },
+ { 155, PT_PC, ucp_Co },
+ { 158, PT_SC, ucp_Common },
+ { 165, PT_SC, ucp_Coptic },
+ { 172, PT_PC, ucp_Cs },
+ { 175, PT_SC, ucp_Cuneiform },
+ { 185, PT_SC, ucp_Cypriot },
+ { 193, PT_SC, ucp_Cyrillic },
+ { 202, PT_SC, ucp_Deseret },
+ { 210, PT_SC, ucp_Devanagari },
+ { 221, PT_SC, ucp_Egyptian_Hieroglyphs },
+ { 242, PT_SC, ucp_Ethiopic },
+ { 251, PT_SC, ucp_Georgian },
+ { 260, PT_SC, ucp_Glagolitic },
+ { 271, PT_SC, ucp_Gothic },
+ { 278, PT_SC, ucp_Greek },
+ { 284, PT_SC, ucp_Gujarati },
+ { 293, PT_SC, ucp_Gurmukhi },
+ { 302, PT_SC, ucp_Han },
+ { 306, PT_SC, ucp_Hangul },
+ { 313, PT_SC, ucp_Hanunoo },
+ { 321, PT_SC, ucp_Hebrew },
+ { 328, PT_SC, ucp_Hiragana },
+ { 337, PT_SC, ucp_Imperial_Aramaic },
+ { 354, PT_SC, ucp_Inherited },
+ { 364, PT_SC, ucp_Inscriptional_Pahlavi },
+ { 386, PT_SC, ucp_Inscriptional_Parthian },
+ { 409, PT_SC, ucp_Javanese },
+ { 418, PT_SC, ucp_Kaithi },
+ { 425, PT_SC, ucp_Kannada },
+ { 433, PT_SC, ucp_Katakana },
+ { 442, PT_SC, ucp_Kayah_Li },
+ { 451, PT_SC, ucp_Kharoshthi },
+ { 462, PT_SC, ucp_Khmer },
+ { 468, PT_GC, ucp_L },
+ { 470, PT_LAMP, 0 },
+ { 473, PT_SC, ucp_Lao },
+ { 477, PT_SC, ucp_Latin },
+ { 483, PT_SC, ucp_Lepcha },
+ { 490, PT_SC, ucp_Limbu },
+ { 496, PT_SC, ucp_Linear_B },
+ { 505, PT_SC, ucp_Lisu },
+ { 510, PT_PC, ucp_Ll },
+ { 513, PT_PC, ucp_Lm },
+ { 516, PT_PC, ucp_Lo },
+ { 519, PT_PC, ucp_Lt },
+ { 522, PT_PC, ucp_Lu },
+ { 525, PT_SC, ucp_Lycian },
+ { 532, PT_SC, ucp_Lydian },
+ { 539, PT_GC, ucp_M },
+ { 541, PT_SC, ucp_Malayalam },
+ { 551, PT_SC, ucp_Mandaic },
+ { 559, PT_PC, ucp_Mc },
+ { 562, PT_PC, ucp_Me },
+ { 565, PT_SC, ucp_Meetei_Mayek },
+ { 578, PT_SC, ucp_Meroitic_Cursive },
+ { 595, PT_SC, ucp_Meroitic_Hieroglyphs },
+ { 616, PT_SC, ucp_Miao },
+ { 621, PT_PC, ucp_Mn },
+ { 624, PT_SC, ucp_Mongolian },
+ { 634, PT_SC, ucp_Myanmar },
+ { 642, PT_GC, ucp_N },
+ { 644, PT_PC, ucp_Nd },
+ { 647, PT_SC, ucp_New_Tai_Lue },
+ { 659, PT_SC, ucp_Nko },
+ { 663, PT_PC, ucp_Nl },
+ { 666, PT_PC, ucp_No },
+ { 669, PT_SC, ucp_Ogham },
+ { 675, PT_SC, ucp_Ol_Chiki },
+ { 684, PT_SC, ucp_Old_Italic },
+ { 695, PT_SC, ucp_Old_Persian },
+ { 707, PT_SC, ucp_Old_South_Arabian },
+ { 725, PT_SC, ucp_Old_Turkic },
+ { 736, PT_SC, ucp_Oriya },
+ { 742, PT_SC, ucp_Osmanya },
+ { 750, PT_GC, ucp_P },
+ { 752, PT_PC, ucp_Pc },
+ { 755, PT_PC, ucp_Pd },
+ { 758, PT_PC, ucp_Pe },
+ { 761, PT_PC, ucp_Pf },
+ { 764, PT_SC, ucp_Phags_Pa },
+ { 773, PT_SC, ucp_Phoenician },
+ { 784, PT_PC, ucp_Pi },
+ { 787, PT_PC, ucp_Po },
+ { 790, PT_PC, ucp_Ps },
+ { 793, PT_SC, ucp_Rejang },
+ { 800, PT_SC, ucp_Runic },
+ { 806, PT_GC, ucp_S },
+ { 808, PT_SC, ucp_Samaritan },
+ { 818, PT_SC, ucp_Saurashtra },
+ { 829, PT_PC, ucp_Sc },
+ { 832, PT_SC, ucp_Sharada },
+ { 840, PT_SC, ucp_Shavian },
+ { 848, PT_SC, ucp_Sinhala },
+ { 856, PT_PC, ucp_Sk },
+ { 859, PT_PC, ucp_Sm },
+ { 862, PT_PC, ucp_So },
+ { 865, PT_SC, ucp_Sora_Sompeng },
+ { 878, PT_SC, ucp_Sundanese },
+ { 888, PT_SC, ucp_Syloti_Nagri },
+ { 901, PT_SC, ucp_Syriac },
+ { 908, PT_SC, ucp_Tagalog },
+ { 916, PT_SC, ucp_Tagbanwa },
+ { 925, PT_SC, ucp_Tai_Le },
+ { 932, PT_SC, ucp_Tai_Tham },
+ { 941, PT_SC, ucp_Tai_Viet },
+ { 950, PT_SC, ucp_Takri },
+ { 956, PT_SC, ucp_Tamil },
+ { 962, PT_SC, ucp_Telugu },
+ { 969, PT_SC, ucp_Thaana },
+ { 976, PT_SC, ucp_Thai },
+ { 981, PT_SC, ucp_Tibetan },
+ { 989, PT_SC, ucp_Tifinagh },
+ { 998, PT_SC, ucp_Ugaritic },
+ { 1007, PT_SC, ucp_Vai },
+ { 1011, PT_ALNUM, 0 },
+ { 1015, PT_PXSPACE, 0 },
+ { 1019, PT_SPACE, 0 },
+ { 1023, PT_WORD, 0 },
+ { 1027, PT_SC, ucp_Yi },
+ { 1030, PT_GC, ucp_Z },
+ { 1032, PT_PC, ucp_Zl },
+ { 1035, PT_PC, ucp_Zp },
+ { 1038, PT_PC, ucp_Zs }
};

const int PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);

Modified: code/trunk/pcregrep.c
===================================================================
--- code/trunk/pcregrep.c    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcregrep.c    2012-06-02 11:03:06 UTC (rev 975)
@@ -251,7 +251,7 @@
   { OP_PATLIST,    'e',      NULL,              "regex(p)=pattern", "specify pattern (may be used more than once)" },
   { OP_NODATA,     'F',      NULL,              "fixed-strings", "patterns are sets of newline-separated strings" },
   { OP_STRING,     'f',      &pattern_filename, "file=path",     "read patterns from file" },
-  { OP_STRING,     N_FILE_LIST, &file_list,     "file-list=path","read files to search from file" }, 
+  { OP_STRING,     N_FILE_LIST, &file_list,     "file-list=path","read files to search from file" },
   { OP_NODATA,     N_FOFFSETS, NULL,            "file-offsets",  "output file offsets, not text" },
   { OP_NODATA,     'H',      NULL,              "with-filename", "force the prefixing filename on output" },
   { OP_NODATA,     'h',      NULL,              "no-filename",   "suppress the prefixing filename on output" },
@@ -1105,15 +1105,15 @@
 endptr = main_buffer + bufflength;


/* Unless binary-files=text, see if we have a binary file. This uses the same
-rule as GNU grep, namely, a search for a binary zero byte near the start of the
+rule as GNU grep, namely, a search for a binary zero byte near the start of the
file. */

 if (binary_files != BIN_TEXT)
   {
-  binary = 
+  binary =
     memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength) != NULL;
   if (binary && binary_files == BIN_NOMATCH) return 1;
-  } 
+  }


 /* Loop while the current pointer is not at the end of the file. For large
 files, endptr will be at the end of the buffer when we are in the middle of the
@@ -1230,16 +1230,16 @@
     /* Just count if just counting is wanted. */


     if (count_only) count++;
-    
-    /* When handling a binary file and binary-files==binary, the "binary" 
-    variable will be set true (it's false in all other cases). In this 
+
+    /* When handling a binary file and binary-files==binary, the "binary"
+    variable will be set true (it's false in all other cases). In this
     situation we just want to output the file name. No need to scan further. */
-    
+
     else if (binary)
       {
       fprintf(stdout, "Binary file %s matches\n", filename);
-      return 0;  
-      }   
+      return 0;
+      }


     /* If all we want is a file name, there is no need to scan any more lines
     in the file. */
@@ -1876,15 +1876,15 @@
   contains an underscore. */


   if (strchr(op->long_name, '_') != NULL) continue;
-  
+
   if (op->one_char > 0 && (op->long_name)[0] == 0)
     n = 31 - printf("  -%c", op->one_char);
-  else    
+  else
     {
-    if (op->one_char > 0) sprintf(s, "-%c,", op->one_char); 
+    if (op->one_char > 0) sprintf(s, "-%c,", op->one_char);
       else strcpy(s, "   ");
     n = 31 - printf("  %s --%s", s, op->long_name);
-    } 
+    }


   if (n < 1) n = 1;
   printf("%.*s%s\n", n, "                           ", op->help_text);
@@ -2356,7 +2356,7 @@


   /* If the option type is OP_PATLIST, it's the -e option, which can be called
   multiple times to create a list of patterns. */
-  
+
   if (op->type == OP_PATLIST)
     {
     if (cmd_pattern_count >= MAX_PATTERN_COUNT)
@@ -2367,9 +2367,9 @@
       }
     patterns[cmd_pattern_count++] = option_data;
     }
-    
+
   /* Handle OP_BINARY_FILES */
-  
+
   else if (op->type == OP_BINFILES)
     {
     if (strcmp(option_data, "binary") == 0)
@@ -2380,11 +2380,11 @@
       binary_files = BIN_TEXT;
     else
       {
-      fprintf(stderr, "pcregrep: unknown value \"%s\" for binary-files\n", 
-        option_data);  
+      fprintf(stderr, "pcregrep: unknown value \"%s\" for binary-files\n",
+        option_data);
       pcregrep_exit(usage(2));
-      }    
-    }   
+      }
+    }


/* Otherwise, deal with single string or numeric data values. */

@@ -2755,7 +2755,7 @@
     goto EXIT2;
     }
   }
-  
+
 /* If a file that contains a list of files to search has been specified, read
 it line by line and search the given files. Otherwise, if there are no further
 arguments, do the business on stdin and exit. */
@@ -2765,30 +2765,30 @@
   char buffer[PATBUFSIZE];
   FILE *fl;
   if (strcmp(file_list, "-") == 0) fl = stdin; else
-    { 
+    {
     fl = fopen(file_list, "rb");
     if (fl == NULL)
       {
-      fprintf(stderr, "pcregrep: Failed to open %s: %s\n", file_list, 
+      fprintf(stderr, "pcregrep: Failed to open %s: %s\n", file_list,
         strerror(errno));
       goto EXIT2;
-      } 
-    }   
+      }
+    }
   while (fgets(buffer, PATBUFSIZE, fl) != NULL)
     {
     int frc;
     char *end = buffer + (int)strlen(buffer);
     while (end > buffer && isspace(end[-1])) end--;
-    *end = 0;  
-    if (*buffer != 0) 
-      { 
-      frc = grep_or_recurse(buffer, dee_action == dee_RECURSE, FALSE); 
+    *end = 0;
+    if (*buffer != 0)
+      {
+      frc = grep_or_recurse(buffer, dee_action == dee_RECURSE, FALSE);
       if (frc > 1) rc = frc;
-        else if (frc == 0 && rc == 1) rc = 0;  
-      }   
-    }  
-  if (fl != stdin) fclose (fl);  
-  } 
+        else if (frc == 0 && rc == 1) rc = 0;
+      }
+    }
+  if (fl != stdin) fclose (fl);
+  }


/* Do this only if there was no file list (and no file arguments). */

@@ -2804,7 +2804,7 @@
at top level - this suppresses the file name if the argument is not a directory
and filenames are not otherwise forced. */

-only_one_at_top = i == argc - 1 && file_list == NULL;
+only_one_at_top = i == argc - 1 && file_list == NULL;

for (; i < argc; i++)
{

Modified: code/trunk/pcreposix.c
===================================================================
--- code/trunk/pcreposix.c    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcreposix.c    2012-06-02 11:03:06 UTC (rev 975)
@@ -160,7 +160,7 @@
   REG_BADPAT,  /* disallowed UTF-8/16 code point (>= 0xd800 && <= 0xdfff) */
   REG_BADPAT,  /* invalid UTF-16 string (should not occur) */
   /* 75 */
-  REG_BADPAT   /* overlong MARK name */  
+  REG_BADPAT   /* overlong MARK name */
 };


/* Table of texts corresponding to POSIX error codes */

Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcretest.c    2012-06-02 11:03:06 UTC (rev 975)
@@ -737,7 +737,7 @@
   "JIT stack limit reached",
   "pattern compiled in wrong mode: 8-bit/16-bit error",
   "pattern compiled with other endianness",
-  "invalid data in workspace for DFA restart" 
+  "invalid data in workspace for DFA restart"
 };



@@ -2600,10 +2600,10 @@
int do_showcaprest = 0;
int do_flip = 0;
int erroroffset, len, delimiter, poffset;
-
-#if !defined NODFA
+
+#if !defined NODFA
int dfa_matched = 0;
-#endif
+#endif

   use_utf = 0;
   debug_lengths = 1;
@@ -3946,9 +3946,9 @@
             {
             fprintf(outfile, "Timing DFA restarts is not supported\n");
             break;
-            }    
-          if (dfa_workspace == NULL) 
-            dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));  
+            }
+          if (dfa_workspace == NULL)
+            dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
           for (i = 0; i < timeitm; i++)
             {
             PCRE_DFA_EXEC(count, re, extra, bptr, len, start_offset,
@@ -4019,9 +4019,9 @@
 #if !defined NODFA
       else if (all_use_dfa || use_dfa)
         {
-        if (dfa_workspace == NULL) 
-          dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));  
-        if (dfa_matched++ == 0)  
+        if (dfa_workspace == NULL)
+          dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
+        if (dfa_matched++ == 0)
           dfa_workspace[0] = -1;  /* To catch bad restart */
         PCRE_DFA_EXEC(count, re, extra, bptr, len, start_offset,
           (options | g_notempty), use_offsets, use_size_offsets, dfa_workspace,