[Pcre-svn] [429] code/trunk: Add pcredemo man page, contain…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [429] code/trunk: Add pcredemo man page, containing a listing of pcredemo.c .
Revision: 429
          http://vcs.pcre.org/viewvc?view=rev&revision=429
Author:   ph10
Date:     2009-09-01 17:10:16 +0100 (Tue, 01 Sep 2009)


Log Message:
-----------
Add pcredemo man page, containing a listing of pcredemo.c.

Modified Paths:
--------------
    code/trunk/132html
    code/trunk/ChangeLog
    code/trunk/PrepareRelease
    code/trunk/README
    code/trunk/doc/html/index.html
    code/trunk/doc/html/pcre.html
    code/trunk/doc/html/pcre_dfa_exec.html
    code/trunk/doc/html/pcre_exec.html
    code/trunk/doc/html/pcre_fullinfo.html
    code/trunk/doc/html/pcreapi.html
    code/trunk/doc/html/pcrecompat.html
    code/trunk/doc/html/pcregrep.html
    code/trunk/doc/html/pcrematching.html
    code/trunk/doc/html/pcrepartial.html
    code/trunk/doc/html/pcreposix.html
    code/trunk/doc/html/pcresample.html
    code/trunk/doc/html/pcretest.html
    code/trunk/doc/index.html.src
    code/trunk/doc/pcre.3
    code/trunk/doc/pcre.txt
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcregrep.txt
    code/trunk/doc/pcresample.3
    code/trunk/doc/pcretest.txt


Added Paths:
-----------
    code/trunk/doc/html/pcredemo.html
    code/trunk/doc/pcredemo.3


Modified: code/trunk/132html
===================================================================
--- code/trunk/132html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/132html    2009-09-01 16:10:16 UTC (rev 429)
@@ -231,6 +231,23 @@
       $_ = "$one $two";
       redo;            # Process the joined lines
       }
+      
+    # .EX/.EE are used in the pcredemo page to bracket the entire program,
+    # which is unmodified except for turning backslash into "\e".
+    
+    elsif (/^\.EX\s*$/)
+      {
+      print TEMP "<PRE>\n";
+      while (<STDIN>)
+        {
+        last if /^\.EE\s*$/; 
+        s/\\e/\\/g;
+        s/&/&amp;/g;   
+        s/</&lt;/g;
+        s/>/&gt;/g;
+        print TEMP; 
+        }   
+      }     


     # Ignore anything not recognized



Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/ChangeLog    2009-09-01 16:10:16 UTC (rev 429)
@@ -72,6 +72,10 @@
     "g". If the first part-match was for the string "dog", restarting with
     "sbody" failed.


+13. Added a pcredemo man page, created automatically from the pcredemo.c file,
+    so that the demonstration program is easily available in environments where 
+    PCRE has not been installed from source.  
+    


Version 7.9 11-Apr-09
---------------------

Modified: code/trunk/PrepareRelease
===================================================================
--- code/trunk/PrepareRelease    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/PrepareRelease    2009-09-01 16:10:16 UTC (rev 429)
@@ -8,8 +8,8 @@
 # following files:


 # 132html     A Perl script that converts a .1 or .3 man page into HTML. It
-#             is called from MakeRelease. It "knows" the relevant troff
-#             constructs that are used in the PCRE man pages.
+#             "knows" the relevant troff constructs that are used in the PCRE
+#             man pages.


 # CleanTxt    A Perl script that cleans up the output of "nroff -man" by
 #             removing backspaces and other redundant text so as to produce
@@ -37,8 +37,9 @@
 This file contains a concatenation of the PCRE man pages, converted to plain
 text format for ease of searching with a text editor, or for use on systems
 that do not have a man page processor. The small individual files that give
-synopses of each function in the library have not been included. There are
-separate text files for the pcregrep and pcretest commands.
+synopses of each function in the library have not been included. Neither has 
+the pcredemo program. There are separate text files for the pcregrep and
+pcretest commands.
 -----------------------------------------------------------------------------



@@ -68,6 +69,41 @@
done


+# Make pcredemo.3 from the pcredemo.c source file
+
+echo "Making pcredemo.3"
+perl <<"END" >pcredemo.3
+  open(IN, "../pcredemo.c") || die "Failed to open pcredemo.c\n";
+  open(OUT, ">pcredemo.3") || die "Failed to open pcredemo.3\n";
+  print OUT ".\\\" Start example.\n" .
+            ".de EX\n" .
+            ".  nr mE \\\\n(.f\n" .
+            ".  nf\n" .
+            ".  nh\n" .
+            ".  ft CW\n" .
+            "..\n" .
+            ".\n" .
+            ".\n" .
+            ".\\\" End example.\n" .
+            ".de EE\n" .
+            ".  ft \\\\n(mE\n" .
+            ".  fi\n" .
+            ".  hy \\\\n(HY\n" .
+            "..\n" .
+            ".\n" .
+            ".EX\n" ; 
+  while (<IN>)
+    {
+    s/\\/\\e/g;
+    print OUT;
+    }
+  print OUT ".EE\n";
+  close(IN);
+  close(OUT);    
+END
+if [ $? != 0 ] ; then exit 1; fi
+
+
 # Make HTML form of the documentation.


echo "Making HTML documentation"

Modified: code/trunk/README
===================================================================
--- code/trunk/README    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/README    2009-09-01 16:10:16 UTC (rev 429)
@@ -712,7 +712,7 @@
                           )   "configure" and config.h
   depcomp                 ) script to find program dependencies, generated by
                           )   automake
-  doc/*.3                 man page sources for the PCRE functions
+  doc/*.3                 man page sources for PCRE
   doc/*.1                 man page sources for pcregrep and pcretest
   doc/index.html.src      the base HTML page
   doc/html/*              HTML documentation
@@ -765,4 +765,4 @@
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 15 August 2009
+Last updated: 01 September 2009


Modified: code/trunk/doc/html/index.html
===================================================================
--- code/trunk/doc/html/index.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/index.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -1,10 +1,10 @@
 <html>
-<!-- This is a manually maintained file that is the root of the HTML version of
-     the PCRE documentation. When the HTML documents are built from the man
-     page versions, the entire doc/html directory is emptied, this file is then
-     copied into doc/html/index.html, and the remaining files therein are
+<!-- This is a manually maintained file that is the root of the HTML version of 
+     the PCRE documentation. When the HTML documents are built from the man 
+     page versions, the entire doc/html directory is emptied, this file is then 
+     copied into doc/html/index.html, and the remaining files therein are 
      created by the 132html script.
--->
+-->      
 <head>
 <title>PCRE specification</title>
 </head>
@@ -36,6 +36,9 @@
 <tr><td><a href="pcrecpp.html">pcrecpp</a></td>
     <td>&nbsp;&nbsp;The C++ wrapper for the PCRE library</td></tr>


+<tr><td><a href="pcredemo.html">pcredemo</a></td>
+    <td>&nbsp;&nbsp;A demonstration C program that uses the PCRE library</td></tr>
+
 <tr><td><a href="pcregrep.html">pcregrep</a></td>
     <td>&nbsp;&nbsp;The <b>pcregrep</b> command</td></tr>


@@ -58,7 +61,7 @@
     <td>&nbsp;&nbsp;How to save and re-use compiled patterns</td></tr>


 <tr><td><a href="pcresample.html">pcresample</a></td>
-    <td>&nbsp;&nbsp;Description of the sample program</td></tr>
+    <td>&nbsp;&nbsp;Discussion of the pcredemo program</td></tr>


 <tr><td><a href="pcrestack.html">pcrestack</a></td>
     <td>&nbsp;&nbsp;Discussion of PCRE's stack usage</td></tr>
@@ -71,11 +74,11 @@
 </table>


<p>
-There are also individual pages that summarize the interface for each function
+There are also individual pages that summarize the interface for each function
in the library:
</p>

-<table>
+<table>    


 <tr><td><a href="pcre_compile.html">pcre_compile</a></td>
     <td>&nbsp;&nbsp;Compile a regular expression</td></tr>
@@ -126,7 +129,7 @@


 <tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
     <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
-
+    
 <tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
     <td>&nbsp;&nbsp;Maintain reference count in compiled pattern</td></tr>



Modified: code/trunk/doc/html/pcre.html
===================================================================
--- code/trunk/doc/html/pcre.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcre.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -30,8 +30,8 @@
 requesting some minor changes that give better JavaScript compatibility.
 </P>
 <P>
-The current implementation of PCRE (release 7.x) corresponds approximately with
-Perl 5.10, including support for UTF-8 encoded strings and Unicode general
+The current implementation of PCRE (release 8.xx) corresponds approximately
+with Perl 5.10, including support for UTF-8 encoded strings and Unicode general
 category properties. However, UTF-8 and Unicode support has to be explicitly
 enabled; it is not the default. The Unicode tables correspond to Unicode
 release 5.1.
@@ -88,8 +88,8 @@
 The user documentation for PCRE comprises a number of different sections. In
 the "man" format, each of these is a separate "man page". In the HTML format,
 each is a separate page, linked from the index page. In the plain text format,
-all the sections are concatenated, for ease of searching. The sections are as
-follows:
+all the sections, except the <b>pcredemo</b> section, are concatenated, for ease
+of searching. The sections are as follows:
 <pre>
   pcre              this document
   pcre-config       show PCRE installation configuration information
@@ -98,6 +98,7 @@
   pcrecallout       details of the callout feature
   pcrecompat        discussion of Perl compatibility
   pcrecpp           details of the C++ wrapper
+  pcredemo          a demonstration C program that uses PCRE
   pcregrep          description of the <b>pcregrep</b> command
   pcrematching      discussion of the two matching algorithms
   pcrepartial       details of the partial matching facility
@@ -106,7 +107,7 @@
   pcreperform       discussion of performance issues
   pcreposix         the POSIX-compatible C API
   pcreprecompile    details of saving and re-using precompiled patterns
-  pcresample        discussion of the sample program
+  pcresample        discussion of the pcredemo program
   pcrestack         discussion of stack usage
   pcretest          description of the <b>pcretest</b> testing command
 </pre>
@@ -297,7 +298,7 @@
 </P>
 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 April 2009
+Last updated: 01 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcre_dfa_exec.html
===================================================================
--- code/trunk/doc/html/pcre_dfa_exec.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcre_dfa_exec.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -63,14 +63,19 @@
   PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
                        validity (only relevant if PCRE_UTF8
                        was set at compile time)
-  PCRE_PARTIAL       Return PCRE_ERROR_PARTIAL for a partial match
+  PCRE_PARTIAL       ) Return PCRE_ERROR_PARTIAL for a partial match 
+  PCRE_PARTIAL_SOFT  )   if no full matches are found
+  PCRE_PARTIAL_HARD  Return PCRE_ERROR_PARTIAL for a partial match 
+                       even if there is a full match as well 
   PCRE_DFA_SHORTEST  Return only the shortest match
   PCRE_DFA_RESTART   This is a restart after a partial match
 </pre>
 There are restrictions on what may appear in a pattern when using this matching
 function. Details are given in the
 <a href="pcrematching.html"><b>pcrematching</b></a>
-documentation.
+documentation. For details of partial matching, see the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+page.
 </P>
 <P>
 A <b>pcre_extra</b> structure contains the following fields:


Modified: code/trunk/doc/html/pcre_exec.html
===================================================================
--- code/trunk/doc/html/pcre_exec.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcre_exec.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -59,15 +59,14 @@
   PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
                        validity (only relevant if PCRE_UTF8
                        was set at compile time)
-  PCRE_PARTIAL       Return PCRE_ERROR_PARTIAL for a partial match
+  PCRE_PARTIAL       ) Return PCRE_ERROR_PARTIAL for a partial match 
+  PCRE_PARTIAL_SOFT  )   if no full matches are found
+  PCRE_PARTIAL_HARD  Return PCRE_ERROR_PARTIAL for a partial match 
+                       even if there is a full match as well 
 </pre>
-There are restrictions on what may appear in a pattern when partial matching is
-requested. For details, see the
+For details of partial matching, see the
 <a href="pcrepartial.html"><b>pcrepartial</b></a>
-page.
-</P>
-<P>
-A <b>pcre_extra</b> structure contains the following fields:
+page. A <b>pcre_extra</b> structure contains the following fields:
 <pre>
   <i>flags</i>        Bits indicating which fields are set
   <i>study_data</i>   Opaque data from <b>pcre_study()</b>


Modified: code/trunk/doc/html/pcre_fullinfo.html
===================================================================
--- code/trunk/doc/html/pcre_fullinfo.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcre_fullinfo.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -49,6 +49,7 @@
   PCRE_INFO_NAMEENTRYSIZE   Size of name table entry
   PCRE_INFO_NAMETABLE       Pointer to name table
   PCRE_INFO_OKPARTIAL       Return 1 if partial matching can be tried
+                              (always returns 1 after release 8.00)
   PCRE_INFO_OPTIONS         Option bits used for compilation
   PCRE_INFO_SIZE            Size of compiled pattern
   PCRE_INFO_STUDYSIZE       Size of study data


Modified: code/trunk/doc/html/pcreapi.html
===================================================================
--- code/trunk/doc/html/pcreapi.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcreapi.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -164,8 +164,10 @@
 The functions <b>pcre_compile()</b>, <b>pcre_compile2()</b>, <b>pcre_study()</b>,
 and <b>pcre_exec()</b> are used for compiling and matching regular expressions
 in a Perl-compatible manner. A sample program that demonstrates the simplest
-way of using them is provided in the file called <i>pcredemo.c</i> in the source
-distribution. The
+way of using them is provided in the file called <i>pcredemo.c</i> in the PCRE
+source distribution. A listing of this program is given in the
+<a href="pcredemo.html"><b>pcredemo</b></a>
+documentation, and the
 <a href="pcresample.html"><b>pcresample</b></a>
 documentation describes how to compile and run it.
 </P>
@@ -1016,10 +1018,11 @@
   PCRE_INFO_OKPARTIAL
 </pre>
 Return 1 if the pattern can be used for partial matching, otherwise 0. The
-fourth argument should point to an <b>int</b> variable. The
+fourth argument should point to an <b>int</b> variable. From release 8.00, this
+always returns 1, because the restrictions that previously applied to partial
+matching have been lifted. The
 <a href="pcrepartial.html"><b>pcrepartial</b></a>
-documentation lists the restrictions that apply to patterns when partial
-matching is used.
+documentation gives details of partial matching.
 <pre>
   PCRE_INFO_OPTIONS
 </pre>
@@ -1246,7 +1249,7 @@
 The unused bits of the <i>options</i> argument for <b>pcre_exec()</b> must be
 zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_<i>xxx</i>,
 PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_START_OPTIMIZE,
-PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.
+PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and PCRE_PARTIAL_HARD.
 <pre>
   PCRE_ANCHORED
 </pre>
@@ -1336,7 +1339,9 @@
 matching a null string by first trying the match again at the same offset with
 PCRE_NOTEMPTY and PCRE_ANCHORED, and then if that fails by advancing the
 starting offset (see below) and trying an ordinary match again. There is some
-code that demonstrates how to do this in the <i>pcredemo.c</i> sample program.
+code that demonstrates how to do this in the 
+<a href="pcredemo.html"><b>pcredemo</b></a>
+sample program.
 <pre>
   PCRE_NO_START_OPTIMIZE
 </pre>
@@ -1373,15 +1378,19 @@
 subject, or a value of <i>startoffset</i> that does not point to the start of a
 UTF-8 character, is undefined. Your program may crash.
 <pre>
-  PCRE_PARTIAL
+  PCRE_PARTIAL_HARD 
+  PCRE_PARTIAL_SOFT
 </pre>
-This option turns on the partial matching feature. If the subject string fails
-to match the pattern, but at some point during the matching process the end of
-the subject was reached (that is, the subject partially matches the pattern and
-the failure to match occurred only because there were not enough subject
-characters), <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL instead of
-PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is used, there are restrictions on what
-may appear in the pattern. These are discussed in the
+These options turn on the partial matching feature. For backwards
+compatibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial match
+occurs if the end of the subject string is reached successfully, but there are
+not enough subject characters to complete the match. If this happens when
+PCRE_PARTIAL_HARD is set, <b>pcre_exec()</b> immediately returns
+PCRE_ERROR_PARTIAL. Otherwise, if PCRE_PARTIAL_SOFT is set, matching continues
+by testing any other alternatives. Only if they all fail is PCRE_ERROR_PARTIAL
+returned (instead of PCRE_ERROR_NOMATCH). The portion of the string that
+provided the partial match is set as the first matching string. There is a more
+detailed discussion in the
 <a href="pcrepartial.html"><b>pcrepartial</b></a>
 documentation.
 </P>
@@ -1582,10 +1591,10 @@
 <pre>
   PCRE_ERROR_BADPARTIAL     (-13)
 </pre>
-The PCRE_PARTIAL option was used with a compiled pattern containing items that
-are not supported for partial matching. See the
-<a href="pcrepartial.html"><b>pcrepartial</b></a>
-documentation for details of partial matching.
+This code is no longer in use. It was formerly returned when the PCRE_PARTIAL
+option was used with a compiled pattern containing items that were not
+supported for partial matching. From release 8.00 onwards, there are no 
+restrictions on partial matching.
 <pre>
   PCRE_ERROR_INTERNAL       (-14)
 </pre>
@@ -1871,19 +1880,24 @@
 <P>
 The unused bits of the <i>options</i> argument for <b>pcre_dfa_exec()</b> must be
 zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_<i>xxx</i>,
-PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL,
-PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last three of these are
-the same as for <b>pcre_exec()</b>, so their description is not repeated here.
+PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD,
+PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
+four of these are exactly the same as for <b>pcre_exec()</b>, so their
+description is not repeated here.
 <pre>
-  PCRE_PARTIAL
+  PCRE_PARTIAL_HARD
+  PCRE_PARTIAL_SOFT 
 </pre>
-This has the same general effect as it does for <b>pcre_exec()</b>, but the
-details are slightly different. When PCRE_PARTIAL is set for
-<b>pcre_dfa_exec()</b>, the return code PCRE_ERROR_NOMATCH is converted into
-PCRE_ERROR_PARTIAL if the end of the subject is reached, there have been no
-complete matches, but there is still at least one matching possibility. The
-portion of the string that provided the partial match is set as the first
-matching string.
+These have the same general effect as they do for <b>pcre_exec()</b>, but the
+details are slightly different. When PCRE_PARTIAL_HARD is set for
+<b>pcre_dfa_exec()</b>, it returns PCRE_ERROR_PARTIAL if the end of the subject
+is reached and there is still at least one matching possibility that requires
+additional characters. This happens even if some complete matches have also
+been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH
+is converted into PCRE_ERROR_PARTIAL if the end of the subject is reached,
+there have been no complete matches, but there is still at least one matching
+possibility. The portion of the string that provided the longest partial match
+is set as the first matching string in both cases.
 <pre>
   PCRE_DFA_SHORTEST
 </pre>
@@ -1894,13 +1908,12 @@
 <pre>
   PCRE_DFA_RESTART
 </pre>
-When <b>pcre_dfa_exec()</b> is called with the PCRE_PARTIAL option, and returns
-a partial match, it is possible to call it again, with additional subject
-characters, and have it continue with the same match. The PCRE_DFA_RESTART
-option requests this action; when it is set, the <i>workspace</i> and
-<i>wscount</i> options must reference the same vector as before because data
-about the match so far is left in them after a partial match. There is more
-discussion of this facility in the
+When <b>pcre_dfa_exec()</b> returns a partial match, it is possible to call it
+again, with additional subject characters, and have it continue with the same
+match. The PCRE_DFA_RESTART option requests this action; when it is set, the
+<i>workspace</i> and <i>wscount</i> options must reference the same vector as
+before because data about the match so far is left in them after a partial
+match. There is more discussion of this facility in the
 <a href="pcrepartial.html"><b>pcrepartial</b></a>
 documentation.
 </P>
@@ -1996,7 +2009,7 @@
 </P>
 <br><a name="SEC22" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 April 2009
+Last updated: 01 September 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrecompat.html
===================================================================
--- code/trunk/doc/html/pcrecompat.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcrecompat.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -19,7 +19,7 @@
 This document describes the differences in the ways that PCRE and Perl handle
 regular expressions. The differences described here are mainly with respect to
 Perl 5.8, though PCRE versions 7.0 and later contain some features that are
-expected to be in the forthcoming Perl 5.10.
+in Perl 5.10.
 </P>
 <P>
 1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
@@ -170,9 +170,9 @@
 REVISION
 </b><br>
 <P>
-Last updated: 11 September 2007
+Last updated: 25 August 2009
 <br>
-Copyright &copy; 1997-2007 University of Cambridge.
+Copyright &copy; 1997-2009 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.


Added: code/trunk/doc/html/pcredemo.html
===================================================================
--- code/trunk/doc/html/pcredemo.html                            (rev 0)
+++ code/trunk/doc/html/pcredemo.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -0,0 +1,354 @@
+<html>
+<head>
+<title>pcredemo specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcredemo man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+</ul>
+<PRE>
+/*************************************************
+*           PCRE DEMONSTRATION PROGRAM           *
+*************************************************/
+
+/* This is a demonstration program to illustrate the most straightforward ways
+of calling the PCRE regular expression library from a C program. See the
+pcresample documentation for a short discussion ("man pcresample" if you have
+the PCRE man pages installed).
+
+In Unix-like environments, compile this program thuswise:
+
+  gcc -Wall pcredemo.c -I/usr/local/include -L/usr/local/lib \
+    -R/usr/local/lib -lpcre
+
+Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and
+library files for PCRE are installed on your system. You don't need -I and -L
+if PCRE is installed in the standard system libraries. Only some operating
+systems (e.g. Solaris) use the -R option.
+
+Building under Windows:
+
+If you want to statically link this program against a non-dll .a file, you must
+define PCRE_STATIC before including pcre.h, otherwise the pcre_malloc() and
+pcre_free() exported functions will be declared __declspec(dllimport), with
+unwanted results. So in this environment, uncomment the following line. */
+
+/* #define PCRE_STATIC */
+
+#include &lt;stdio.h&gt;
+#include &lt;string.h&gt;
+#include &lt;pcre.h&gt;
+
+#define OVECCOUNT 30    /* should be a multiple of 3 */
+
+
+int main(int argc, char **argv)
+{
+pcre *re;
+const char *error;
+char *pattern;
+char *subject;
+unsigned char *name_table;
+int erroffset;
+int find_all;
+int namecount;
+int name_entry_size;
+int ovector[OVECCOUNT];
+int subject_length;
+int rc, i;
+
+
+/**************************************************************************
+* First, sort out the command line. There is only one possible option at  *
+* the moment, "-g" to request repeated matching to find all occurrences,  *
+* like Perl's /g option. We set the variable find_all to a non-zero value *
+* if the -g option is present. Apart from that, there must be exactly two *
+* arguments.                                                              *
+**************************************************************************/
+
+find_all = 0;
+for (i = 1; i &lt; argc; i++)
+  {
+  if (strcmp(argv[i], "-g") == 0) find_all = 1;
+    else break;
+  }
+
+/* After the options, we require exactly two arguments, which are the pattern,
+and the subject string. */
+
+if (argc - i != 2)
+  {
+  printf("Two arguments required: a regex and a subject string\n");
+  return 1;
+  }
+
+pattern = argv[i];
+subject = argv[i+1];
+subject_length = (int)strlen(subject);
+
+
+/*************************************************************************
+* Now we are going to compile the regular expression pattern, and handle *
+* and errors that are detected.                                          *
+*************************************************************************/
+
+re = pcre_compile(
+  pattern,              /* the pattern */
+  0,                    /* default options */
+  &amp;error,               /* for error message */
+  &amp;erroffset,           /* for error offset */
+  NULL);                /* use default character tables */
+
+/* Compilation failed: print the error message and exit */
+
+if (re == NULL)
+  {
+  printf("PCRE compilation failed at offset %d: %s\n", erroffset, error);
+  return 1;
+  }
+
+
+/*************************************************************************
+* If the compilation succeeded, we call PCRE again, in order to do a     *
+* pattern match against the subject string. This does just ONE match. If *
+* further matching is needed, it will be done below.                     *
+*************************************************************************/
+
+rc = pcre_exec(
+  re,                   /* the compiled pattern */
+  NULL,                 /* no extra data - we didn't study the pattern */
+  subject,              /* the subject string */
+  subject_length,       /* the length of the subject */
+  0,                    /* start at offset 0 in the subject */
+  0,                    /* default options */
+  ovector,              /* output vector for substring information */
+  OVECCOUNT);           /* number of elements in the output vector */
+
+/* Matching failed: handle error cases */
+
+if (rc &lt; 0)
+  {
+  switch(rc)
+    {
+    case PCRE_ERROR_NOMATCH: printf("No match\n"); break;
+    /*
+    Handle other special cases if you like
+    */
+    default: printf("Matching error %d\n", rc); break;
+    }
+  pcre_free(re);     /* Release memory used for the compiled pattern */
+  return 1;
+  }
+
+/* Match succeded */
+
+printf("\nMatch succeeded at offset %d\n", ovector[0]);
+
+
+/*************************************************************************
+* We have found the first match within the subject string. If the output *
+* vector wasn't big enough, say so. Then output any substrings that were *
+* captured.                                                              *
+*************************************************************************/
+
+/* The output vector wasn't big enough */
+
+if (rc == 0)
+  {
+  rc = OVECCOUNT/3;
+  printf("ovector only has room for %d captured substrings\n", rc - 1);
+  }
+
+/* Show substrings stored in the output vector by number. Obviously, in a real
+application you might want to do things other than print them. */
+
+for (i = 0; i &lt; rc; i++)
+  {
+  char *substring_start = subject + ovector[2*i];
+  int substring_length = ovector[2*i+1] - ovector[2*i];
+  printf("%2d: %.*s\n", i, substring_length, substring_start);
+  }
+
+
+/**************************************************************************
+* That concludes the basic part of this demonstration program. We have    *
+* compiled a pattern, and performed a single match. The code that follows *
+* shows first how to access named substrings, and then how to code for    *
+* repeated matches on the same subject.                                   *
+**************************************************************************/
+
+/* See if there are any named substrings, and if so, show them by name. First
+we have to extract the count of named parentheses from the pattern. */
+
+(void)pcre_fullinfo(
+  re,                   /* the compiled pattern */
+  NULL,                 /* no extra data - we didn't study the pattern */
+  PCRE_INFO_NAMECOUNT,  /* number of named substrings */
+  &amp;namecount);          /* where to put the answer */
+
+if (namecount &lt;= 0) printf("No named substrings\n"); else
+  {
+  unsigned char *tabptr;
+  printf("Named substrings\n");
+
+  /* Before we can access the substrings, we must extract the table for
+  translating names to numbers, and the size of each entry in the table. */
+
+  (void)pcre_fullinfo(
+    re,                       /* the compiled pattern */
+    NULL,                     /* no extra data - we didn't study the pattern */
+    PCRE_INFO_NAMETABLE,      /* address of the table */
+    &amp;name_table);             /* where to put the answer */
+
+  (void)pcre_fullinfo(
+    re,                       /* the compiled pattern */
+    NULL,                     /* no extra data - we didn't study the pattern */
+    PCRE_INFO_NAMEENTRYSIZE,  /* size of each entry in the table */
+    &amp;name_entry_size);        /* where to put the answer */
+
+  /* Now we can scan the table and, for each entry, print the number, the name,
+  and the substring itself. */
+
+  tabptr = name_table;
+  for (i = 0; i &lt; namecount; i++)
+    {
+    int n = (tabptr[0] &lt;&lt; 8) | tabptr[1];
+    printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2,
+      ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
+    tabptr += name_entry_size;
+    }
+  }
+
+
+/*************************************************************************
+* If the "-g" option was given on the command line, we want to continue  *
+* to search for additional matches in the subject string, in a similar   *
+* way to the /g option in Perl. This turns out to be trickier than you   *
+* might think because of the possibility of matching an empty string.    *
+* What happens is as follows:                                            *
+*                                                                        *
+* If the previous match was NOT for an empty string, we can just start   *
+* the next match at the end of the previous one.                         *
+*                                                                        *
+* If the previous match WAS for an empty string, we can't do that, as it *
+* would lead to an infinite loop. Instead, a special call of pcre_exec() *
+* is made with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set. The first  *
+* of these tells PCRE that an empty string is not a valid match; other   *
+* possibilities must be tried. The second flag restricts PCRE to one     *
+* match attempt at the initial string position. If this match succeeds,  *
+* an alternative to the empty string match has been found, and we can    *
+* proceed round the loop.                                                *
+*************************************************************************/
+
+if (!find_all)
+  {
+  pcre_free(re);   /* Release the memory used for the compiled pattern */
+  return 0;        /* Finish unless -g was given */
+  }
+
+/* Loop for second and subsequent matches */
+
+for (;;)
+  {
+  int options = 0;                 /* Normally no options */
+  int start_offset = ovector[1];   /* Start at end of previous match */
+
+  /* If the previous match was for an empty string, we are finished if we are
+  at the end of the subject. Otherwise, arrange to run another match at the
+  same point to see if a non-empty match can be found. */
+
+  if (ovector[0] == ovector[1])
+    {
+    if (ovector[0] == subject_length) break;
+    options = PCRE_NOTEMPTY | PCRE_ANCHORED;
+    }
+
+  /* Run the next matching operation */
+
+  rc = pcre_exec(
+    re,                   /* the compiled pattern */
+    NULL,                 /* no extra data - we didn't study the pattern */
+    subject,              /* the subject string */
+    subject_length,       /* the length of the subject */
+    start_offset,         /* starting offset in the subject */
+    options,              /* options */
+    ovector,              /* output vector for substring information */
+    OVECCOUNT);           /* number of elements in the output vector */
+
+  /* This time, a result of NOMATCH isn't an error. If the value in "options"
+  is zero, it just means we have found all possible matches, so the loop ends.
+  Otherwise, it means we have failed to find a non-empty-string match at a
+  point where there was a previous empty-string match. In this case, we do what
+  Perl does: advance the matching position by one, and continue. We do this by
+  setting the "end of previous match" offset, because that is picked up at the
+  top of the loop as the point at which to start again. */
+
+  if (rc == PCRE_ERROR_NOMATCH)
+    {
+    if (options == 0) break;
+    ovector[1] = start_offset + 1;
+    continue;    /* Go round the loop again */
+    }
+
+  /* Other matching errors are not recoverable. */
+
+  if (rc &lt; 0)
+    {
+    printf("Matching error %d\n", rc);
+    pcre_free(re);    /* Release memory used for the compiled pattern */
+    return 1;
+    }
+
+  /* Match succeded */
+
+  printf("\nMatch succeeded again at offset %d\n", ovector[0]);
+
+  /* The match succeeded, but the output vector wasn't big enough. */
+
+  if (rc == 0)
+    {
+    rc = OVECCOUNT/3;
+    printf("ovector only has room for %d captured substrings\n", rc - 1);
+    }
+
+  /* As before, show substrings stored in the output vector by number, and then
+  also any named substrings. */
+
+  for (i = 0; i &lt; rc; i++)
+    {
+    char *substring_start = subject + ovector[2*i];
+    int substring_length = ovector[2*i+1] - ovector[2*i];
+    printf("%2d: %.*s\n", i, substring_length, substring_start);
+    }
+
+  if (namecount &lt;= 0) printf("No named substrings\n"); else
+    {
+    unsigned char *tabptr = name_table;
+    printf("Named substrings\n");
+    for (i = 0; i &lt; namecount; i++)
+      {
+      int n = (tabptr[0] &lt;&lt; 8) | tabptr[1];
+      printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2,
+        ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
+      tabptr += name_entry_size;
+      }
+    }
+  }      /* End of loop to find second and subsequent matches */
+
+printf("\n");
+pcre_free(re);       /* Release memory used for the compiled pattern */
+return 0;
+}
+
+/* End of pcredemo.c */
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>


Modified: code/trunk/doc/html/pcregrep.html
===================================================================
--- code/trunk/doc/html/pcregrep.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcregrep.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -119,6 +119,12 @@
 </P>
 <br><a name="SEC4" href="#TOC1">OPTIONS</a><br>
 <P>
+The order in which some of the options appear can affect the output. For 
+example, both the <b>-h</b> and <b>-l</b> options affect the printing of file 
+names. Whichever comes later in the command line will be the one that takes 
+effect.
+</P>
+<P>
 <b>--</b>
 This terminate the list of options. It is useful if the next item on the
 command line starts with a hyphen but is not an option. This allows for the
@@ -149,10 +155,13 @@
 </P>
 <P>
 <b>-c</b>, <b>--count</b>
-Do not output individual lines; instead just output a count of the number of
-lines that would otherwise have been output. If several files are given, a
-count is output for each of them. In this mode, the <b>-A</b>, <b>-B</b>, and
-<b>-C</b> options are ignored.
+Do not output individual lines from the files that are being scanned; instead
+output the number of lines that would otherwise have been shown. If no lines
+are selected, the number zero is output. If several files are are being
+scanned, a count is output for each of them. However, if the
+<b>--files-with-matches</b> option is also used, only those files whose counts
+are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
+<b>-B</b>, and <b>-C</b> options are ignored.
 </P>
 <P>
 <b>--colour</b>, <b>--color</b>
@@ -316,8 +325,11 @@
 <b>-l</b>, <b>--files-with-matches</b>
 Instead of outputting lines from the files, just output the names of the files
 containing lines that would have been output. Each file name is output
-once, on a separate line. Searching stops as soon as a matching line is found
-in a file.
+once, on a separate line. Searching normally stops as soon as a matching line
+is found in a file. However, if the <b>-c</b> (count) option is also used, 
+matching continues in order to obtain the correct count, and those files that 
+have at least one match are listed along with their counts. Using this option 
+with <b>-c</b> is a way of suppressing the listing of files with no matches.
 </P>
 <P>
 <b>--label</b>=<i>name</i>
@@ -462,7 +474,9 @@
 as in the GNU <b>grep</b> program. Any long option of the form
 <b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
 (PCRE terminology). However, the <b>--locale</b>, <b>-M</b>, <b>--multiline</b>,
-<b>-u</b>, and <b>--utf-8</b> options are specific to <b>pcregrep</b>.
+<b>-u</b>, and <b>--utf-8</b> options are specific to <b>pcregrep</b>. If both the 
+<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names, 
+without counts, but <b>pcregrep</b> gives the counts.
 </P>
 <br><a name="SEC8" href="#TOC1">OPTIONS WITH DATA</a><br>
 <P>
@@ -524,7 +538,7 @@
 </P>
 <br><a name="SEC13" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 01 March 2009
+Last updated: 12 August 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcrematching.html
===================================================================
--- code/trunk/doc/html/pcrematching.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcrematching.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -177,13 +177,7 @@
 callouts.
 </P>
 <P>
-2. There is much better support for partial matching. The restrictions on the
-content of the pattern that apply when using the standard algorithm for partial
-matching do not apply to the alternative algorithm. For non-anchored patterns,
-the starting position of a partial match is available.
-</P>
-<P>
-3. Because the alternative algorithm scans the subject string just once, and
+2. Because the alternative algorithm scans the subject string just once, and
 never needs to backtrack, it is possible to pass very long subject strings to
 the matching function in several pieces, checking for partial matching each
 time.
@@ -215,9 +209,9 @@
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 19 April 2008
+Last updated: 25 August 2009
 <br>
-Copyright &copy; 1997-2008 University of Cambridge.
+Copyright &copy; 1997-2009 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.


Modified: code/trunk/doc/html/pcrepartial.html
===================================================================
--- code/trunk/doc/html/pcrepartial.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcrepartial.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -14,11 +14,16 @@
 <br>
 <ul>
 <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
-<li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>
-<li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
-<li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
-<li><a name="TOC5" href="#SEC5">AUTHOR</a>
-<li><a name="TOC6" href="#SEC6">REVISION</a>
+<li><a name="TOC2" href="#SEC2">PARTIAL MATCHING USING pcre_exec()</a>
+<li><a name="TOC3" href="#SEC3">PARTIAL MATCHING USING pcre_dfa_exec()</a>
+<li><a name="TOC4" href="#SEC4">PARTIAL MATCHING AND WORD BOUNDARIES</a>
+<li><a name="TOC5" href="#SEC5">FORMERLY RESTRICTED PATTERNS</a>
+<li><a name="TOC6" href="#SEC6">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
+<li><a name="TOC7" href="#SEC7">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
+<li><a name="TOC8" href="#SEC8">MULTI-SEGMENT MATCHING WITH pcre_exec()</a>
+<li><a name="TOC9" href="#SEC9">ISSUES WITH MULTI-SEGMENT MATCHING</a>
+<li><a name="TOC10" href="#SEC10">AUTHOR</a>
+<li><a name="TOC11" href="#SEC11">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
 <P>
@@ -37,78 +42,155 @@
 </pre>
 If the application sees the user's keystrokes one by one, and can check that
 what has been typed so far is potentially valid, it is able to raise an error
-as soon as a mistake is made, possibly beeping and not reflecting the
-character that has been typed. This immediate feedback is likely to be a better
+as soon as a mistake is made, by beeping and not reflecting the character that
+has been typed, for example. This immediate feedback is likely to be a better
 user interface than a check that is delayed until the entire string has been
-entered.
+entered. Partial matching can also sometimes be useful when the subject string
+is very long and is not all available at once.
 </P>
 <P>
-PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
-option, which can be set when calling <b>pcre_exec()</b> or
-<b>pcre_dfa_exec()</b>. When this flag is set for <b>pcre_exec()</b>, the return
-code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
-during the matching process the last part of the subject string matched part of
-the pattern. Unfortunately, for non-anchored matching, it is not possible to
-obtain the position of the start of the partial match. No captured data is set
-when PCRE_ERROR_PARTIAL is returned.
+PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and
+PCRE_PARTIAL_HARD options, which can be set when calling <b>pcre_exec()</b> or
+<b>pcre_dfa_exec()</b>. For backwards compatibility, PCRE_PARTIAL is a synonym 
+for PCRE_PARTIAL_SOFT. The essential difference between the two options is 
+whether or not a partial match is preferred to an alternative complete match, 
+though the details differ between the two matching functions. If both options 
+are set, PCRE_PARTIAL_HARD takes precedence.
 </P>
 <P>
-When PCRE_PARTIAL is set for <b>pcre_dfa_exec()</b>, the return code
-PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
-subject is reached, there have been no complete matches, but there is still at
-least one matching possibility. The portion of the string that provided the
-partial match is set as the first matching string.
+Setting a partial matching option disables one of PCRE's optimizations. PCRE
+remembers the last literal byte in a pattern, and abandons matching immediately
+if such a byte is not present in the subject string. This optimization cannot
+be used for a subject string that might match only partially.
 </P>
+<br><a name="SEC2" href="#TOC1">PARTIAL MATCHING USING pcre_exec()</a><br>
 <P>
-Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
-last literal byte in a pattern, and abandons matching immediately if such a
-byte is not present in the subject string. This optimization cannot be used
-for a subject string that might match only partially.
+A partial match occurs during a call to <b>pcre_exec()</b> whenever the end of
+the subject string is reached successfully, but matching cannot continue
+because more characters are needed. However, at least one character must have
+been matched. (In other words, a partial match can never be an empty string.)
 </P>
-<br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>
 <P>
-Because of the way certain internal optimizations are implemented in the
-<b>pcre_exec()</b> function, the PCRE_PARTIAL option cannot be used with all
-patterns. These restrictions do not apply when <b>pcre_dfa_exec()</b> is used.
-For <b>pcre_exec()</b>, repeated single characters such as
+If PCRE_PARTIAL_SOFT is set, the partial match is remembered, but matching
+continues as normal, and other alternatives in the pattern are tried. If no
+complete match can be found, <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL
+instead of PCRE_ERROR_NOMATCH, and if there are at least two slots in the
+offsets vector, they are filled in with the offsets of the longest string that
+partially matched. Consider this pattern:
 <pre>
-  a{2,4}
+  /123\w+X|dogY/
 </pre>
-and repeated single metasequences such as
+If this is matched against the subject string "abc123dog", both
+alternatives fail to match, but the end of the subject is reached during 
+matching, so PCRE_ERROR_PARTIAL is returned instead of PCRE_ERROR_NOMATCH. The
+offsets are set to 3 and 9, identifying "123dog" as the longest partial match
+that was found. (In this example, there are two partial matches, because "dog"
+on its own partially matches the second alternative.)
+</P>
+<P>
+If PCRE_PARTIAL_HARD is set for <b>pcre_exec()</b>, it returns 
+PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to
+search for possible complete matches. The difference between the two options
+can be illustrated by a pattern such as:
 <pre>
-  \d+
+  /dog(sbody)?/
 </pre>
-are not permitted if the maximum number of occurrences is greater than one.
-Optional items such as \d? (where the maximum is one) are permitted.
-Quantifiers with any values are permitted after parentheses, so the invalid
-examples above can be coded thus:
+This matches either "dog" or "dogsbody", greedily (that is, it prefers the 
+longer string if possible). If it is matched against the string "dog" with
+PCRE_PARTIAL_SOFT, it yields a complete match for "dog". However, if 
+PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL. On the other hand, 
+if the pattern is made ungreedy the result is different:
 <pre>
-  (a){2,4}
-  (\d)+
+  /dog(sbody)??/
 </pre>
-These constructions run more slowly, but for the kinds of application that are
-envisaged for this facility, this is not felt to be a major restriction.
+In this case the result is always a complete match because <b>pcre_exec()</b> 
+finds that first, and it never continues after finding a match. It might be 
+easier to follow this explanation by thinking of the two patterns like this:
+<pre>
+  /dog(sbody)?/    is the same as  /dogsbody|dog/
+  /dog(sbody)??/   is the same as  /dog|dogsbody/
+</pre>
+The second pattern will never match "dogsbody" when <b>pcre_exec()</b> is 
+used, because it will always find the shorter match first.
 </P>
+<br><a name="SEC3" href="#TOC1">PARTIAL MATCHING USING pcre_dfa_exec()</a><br>
 <P>
-If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
-<b>pcre_exec()</b> returns the error code PCRE_ERROR_BADPARTIAL (-13).
-You can use the PCRE_INFO_OKPARTIAL call to <b>pcre_fullinfo()</b> to find out
-if a compiled pattern can be used for partial matching.
+The <b>pcre_dfa_exec()</b> function moves along the subject string character by 
+character, without backtracking, searching for all possible matches 
+simultaneously. If the end of the subject is reached before the end of the 
+pattern, there is the possibility of a partial match, again provided that at
+least one character has matched.
 </P>
-<br><a name="SEC3" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a><br>
 <P>
+When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned only if there
+have been no complete matches. Otherwise, the complete matches are returned.
+However, if PCRE_PARTIAL_HARD is set, a partial match takes precedence over any
+complete matches. The portion of the string that provided the longest partial
+match is set as the first matching string, provided there are at least two
+slots in the offsets vector.
+</P>
+<P>
+Because <b>pcre_dfa_exec()</b> always searches for all possible matches, and 
+there is no difference between greedy and ungreedy repetition, its behaviour is
+different from <b>pcre_exec</b> when PCRE_PARTIAL_HARD is set. Consider the 
+string "dog" matched against the ungreedy pattern shown above:
+<pre>
+  /dog(sbody)??/
+</pre>
+Whereas <b>pcre_exec()</b> stops as soon as it finds the complete match for 
+"dog", <b>pcre_dfa_exec()</b> also finds the partial match for "dogsbody", and
+so returns that when PCRE_PARTIAL_HARD is set.
+</P>
+<br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br>
+<P>
+If a pattern ends with one of sequences \w or \W, which test for word 
+boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive 
+results. Consider this pattern:
+<pre>
+  /\bcat\b/
+</pre>
+This matches "cat", provided there is a word boundary at either end. If the
+subject string is "the cat", the comparison of the final "t" with a following
+character cannot take place, so a partial match is found. However, 
+<b>pcre_exec()</b> carries on with normal matching, which matches \b at the end 
+of the subject when the last character is a letter, thus finding a complete 
+match. The result, therefore, is <i>not</i> PCRE_ERROR_PARTIAL. The same thing 
+happens with <b>pcre_dfa_exec()</b>, because it also finds the complete match.
+</P>
+<P>
+Using PCRE_PARTIAL_HARD in this case does yield PCRE_ERROR_PARTIAL, because 
+then the partial match takes precedence.
+</P>
+<br><a name="SEC5" href="#TOC1">FORMERLY RESTRICTED PATTERNS</a><br>
+<P>
+For releases of PCRE prior to 8.00, because of the way certain internal
+optimizations were implemented in the <b>pcre_exec()</b> function, the
+PCRE_PARTIAL option (predecessor of PCRE_PARTIAL_SOFT) could not be used with
+all patterns. From release 8.00 onwards, the restrictions no longer apply, and
+partial matching with <b>pcre_exec()</b> can be requested for any pattern.
+</P>
+<P>
+Items that were formerly restricted were repeated single characters and
+repeated metasequences. If PCRE_PARTIAL was set for a pattern that did not
+conform to the restrictions, <b>pcre_exec()</b> returned the error code
+PCRE_ERROR_BADPARTIAL (-13). This error code is no longer in use. The
+PCRE_INFO_OKPARTIAL call to <b>pcre_fullinfo()</b> to find out if a compiled
+pattern can be used for partial matching now always returns 1.
+</P>
+<br><a name="SEC6" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a><br>
+<P>
 If the escape sequence \P is present in a <b>pcretest</b> data line, the
-PCRE_PARTIAL flag is used for the match. Here is a run of <b>pcretest</b> that
-uses the date example quoted above:
+PCRE_PARTIAL_SOFT option is used for the match. Here is a run of <b>pcretest</b>
+that uses the date example quoted above:
 <pre>
     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
   data&#62; 25jun04\P
    0: 25jun04
    1: jun
   data&#62; 25dec3\P
-  Partial match
+  Partial match: 23dec3
   data&#62; 3ju\P
-  Partial match
+  Partial match: 3ju
   data&#62; 3juj\P
   No match
   data&#62; j\P
@@ -116,34 +198,23 @@
 </pre>
 The first data string is matched completely, so <b>pcretest</b> shows the
 matched substrings. The remaining four strings do not match the complete
-pattern, but the first two are partial matches. The same test, using
-<b>pcre_dfa_exec()</b> matching (by means of the \D escape sequence), produces
-the following output:
-<pre>
-    re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
-  data&#62; 25jun04\P\D
-   0: 25jun04
-  data&#62; 23dec3\P\D
-  Partial match: 23dec3
-  data&#62; 3ju\P\D
-  Partial match: 3ju
-  data&#62; 3juj\P\D
-  No match
-  data&#62; j\P\D
-  No match
-</pre>
-Notice that in this case the portion of the string that was matched is made
-available.
+pattern, but the first two are partial matches. Similar output is obtained
+when <b>pcre_dfa_exec()</b> is used.
 </P>
-<br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
 <P>
+If the escape sequence \P is present more than once in a <b>pcretest</b> data
+line, the PCRE_PARTIAL_HARD option is set for the match.
+</P>
+<br><a name="SEC7" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
+<P>
 When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
 to continue the match by providing additional subject data and calling
 <b>pcre_dfa_exec()</b> again with the same compiled regular expression, this
-time setting the PCRE_DFA_RESTART option. You must also pass the same working
+time setting the PCRE_DFA_RESTART option. You must pass the same working
 space as before, because this is where details of the previous partial match
 are stored. Here is an example using <b>pcretest</b>, using the \R escape
-sequence to set the PCRE_DFA_RESTART option (\P and \D are as above):
+sequence to set the PCRE_DFA_RESTART option (\D specifies the use of
+<b>pcre_dfa_exec()</b>):
 <pre>
     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
   data&#62; 23ja\P\D
@@ -158,33 +229,62 @@
 program to do that if it needs to.
 </P>
 <P>
-You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial matching
-over multiple segments. This facility can be used to pass very long subject
-strings to <b>pcre_dfa_exec()</b>. However, some care is needed for certain
-types of pattern.
+You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
+PCRE_DFA_RESTART to continue partial matching over multiple segments. This
+facility can be used to pass very long subject strings to
+<b>pcre_dfa_exec()</b>.
 </P>
+<br><a name="SEC8" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_exec()</a><br>
 <P>
+From release 8.00, <b>pcre_exec()</b> can also be used to do multi-segment 
+matching. Unlike <b>pcre_dfa_exec()</b>, it is not possible to restart the 
+previous match with a new segment of data. Instead, new data must be added to 
+the previous subject string, and the entire match re-run, starting from the 
+point where the partial match occurred. Earlier data can be discarded.
+Consider an unanchored pattern that matches dates:
+<pre>
+    re&#62; /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
+  data&#62; The date is 23ja\P
+  Partial match: 23ja
+</pre>
+The this stage, an application could discard the text preceding "23ja", add on 
+text from the next segment, and call <b>pcre_exec()</b> again. Unlike 
+<b>pcre_dfa_exec()</b>, the entire matching string must always be available, and 
+the complete matching process occurs for each call, so more memory and more 
+processing time is needed.
+</P>
+<br><a name="SEC9" href="#TOC1">ISSUES WITH MULTI-SEGMENT MATCHING</a><br>
+<P>
+Certain types of pattern may give problems with multi-segment matching, 
+whichever matching function is used.
+</P>
+<P>
 1. If the pattern contains tests for the beginning or end of a line, you need
 to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
 subject string for any call does not contain the beginning or end of a line.
 </P>
 <P>
 2. If the pattern contains backward assertions (including \b or \B), you need
-to arrange for some overlap in the subject strings to allow for this. For
-example, you could pass the subject in chunks that are 500 bytes long, but in
-a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
-bytes at the start of the buffer.
+to arrange for some overlap in the subject strings to allow for them to be
+correctly tested at the start of each substring. For example, using
+<b>pcre_dfa_exec()</b>, you could pass the subject in chunks that are 500 bytes
+long, but in a buffer of 700 bytes, with the starting offset set to 200 and the
+previous 200 bytes at the start of the buffer.
 </P>
 <P>
-3. Matching a subject string that is split into multiple segments does not
-always produce exactly the same result as matching over one single long string.
-The difference arises when there are multiple matching possibilities, because a
-partial match result is given only when there are no completed matches in a
-call to <b>pcre_dfa_exec()</b>. This means that as soon as the shortest match has
+3. Matching a subject string that is split into multiple segments may not
+always produce exactly the same result as matching over one single long string,
+especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and 
+Word Boundaries" above describes an issue that arises if the pattern ends with 
+\b or \B. Another kind of difference may occur when there are multiple
+matching possibilities, because a partial match result is given only when there
+are no completed matches. This means that as soon as the shortest match has
 been found, continuation to a new subject segment is no longer possible.
-Consider this <b>pcretest</b> example:
+Consider again this <b>pcretest</b> example:
 <pre>
     re&#62; /dog(sbody)?/
+  data&#62; dogsb\P
+   0: dog    
   data&#62; do\P\D
   Partial match: do
   data&#62; gsb\R\P\D
@@ -193,26 +293,40 @@
    0: dogsbody
    1: dog
 </pre>
-The pattern matches the words "dog" or "dogsbody". When the subject is
-presented in several parts ("do" and "gsb" being the first two) the match stops
-when "dog" has been found, and it is not possible to continue. On the other
-hand, if "dogsbody" is presented as a single string, both matches are found.
+The first data line passes the string "dogsb" to <b>pcre_exec()</b>, setting the
+PCRE_PARTIAL_SOFT option. Although the string is a partial match for
+"dogsbody", the result is not PCRE_ERROR_PARTIAL, because the shorter string
+"dog" is a complete match. Similarly, when the subject is presented to
+<b>pcre_dfa_exec()</b> in several parts ("do" and "gsb" being the first two) the
+match stops when "dog" has been found, and it is not possible to continue. On
+the other hand, if "dogsbody" is presented as a single string,
+<b>pcre_dfa_exec()</b> finds both matches.
 </P>
 <P>
-Because of this phenomenon, it does not usually make sense to end a pattern
-that is going to be matched in this way with a variable repeat.
+Because of these problems, it is probably best to use PCRE_PARTIAL_HARD when
+matching multi-segment data. The example above then behaves differently:
+<pre>
+    re&#62; /dog(sbody)?/
+  data&#62; dogsb\P\P
+  Partial match: dogsb 
+  data&#62; do\P\D
+  Partial match: do
+  data&#62; gsb\R\P\P\D
+  Partial match: gsb    
+
+</PRE>
 </P>
 <P>
 4. Patterns that contain alternatives at the top level which do not all
-start with the same pattern item may not work as expected. For example,
-consider this pattern:
+start with the same pattern item may not work as expected when 
+<b>pcre_dfa_exec()</b> is used. For example, consider this pattern:
 <pre>
   1234|3789
 </pre>
 If the first part of the subject is "ABC123", a partial match of the first
 alternative is found at offset 3. There is no partial match for the second
 alternative, because such a match does not start at the same point in the
-subject string. Attempting to continue with the string "789" does not yield a
+subject string. Attempting to continue with the string "7890" does not yield a
 match because only those alternatives that match at one point in the subject
 are remembered. The problem arises because the start of the second alternative
 matches within the first alternative. There is no problem with anchored
@@ -220,9 +334,19 @@
 <pre>
   1234|ABCD
 </pre>
-where no string can be a partial match for both alternatives.
+where no string can be a partial match for both alternatives. This is not a
+problem if \fPpcre_exec()\fP is used, because the entire match has to be rerun 
+each time:
+<pre>
+    re&#62; /1234|3789/
+  data&#62; ABC123\P
+  Partial match: 123
+  data&#62; 1237890
+   0: 3789
+
+</PRE>
 </P>
-<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@@ -231,11 +355,11 @@
 Cambridge CB2 3QH, England.
 <br>
 </P>
-<br><a name="SEC6" href="#TOC1">REVISION</a><br>
+<br><a name="SEC11" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 04 June 2007
+Last updated: 31 August 2009
 <br>
-Copyright &copy; 1997-2007 University of Cambridge.
+Copyright &copy; 1997-2009 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.


Modified: code/trunk/doc/html/pcreposix.html
===================================================================
--- code/trunk/doc/html/pcreposix.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcreposix.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -143,6 +143,11 @@
 is public: <i>re_nsub</i> contains the number of capturing subpatterns in
 the regular expression. Various error codes are defined in the header file.
 </P>
+<P>
+NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
+use the contents of the <i>preg</i> structure. If, for example, you pass it to
+<b>regexec()</b>, the result is undefined and your program is likely to crash.
+</P>
 <br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
 <P>
 This area is not simple, because POSIX and Perl take different views of things.
@@ -257,7 +262,7 @@
 </P>
 <br><a name="SEC9" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 March 2009
+Last updated: 15 August 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/html/pcresample.html
===================================================================
--- code/trunk/doc/html/pcresample.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcresample.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -17,7 +17,11 @@
 </b><br>
 <P>
 A simple, complete demonstration program, to get you started with using PCRE,
-is supplied in the file <i>pcredemo.c</i> in the PCRE distribution.
+is supplied in the file <i>pcredemo.c</i> in the PCRE distribution. A listing of
+this program is given in the
+<a href="pcredemo.html"><b>pcredemo</b></a>
+documentation. If you do not have a copy of the PCRE distribution, you can save 
+this listing to re-create <i>pcredemo.c</i>.
 </P>
 <P>
 The program compiles the regular expression that is its first argument, and
@@ -55,13 +59,15 @@
 Note that there is a much more comprehensive test program, called
 <a href="pcretest.html"><b>pcretest</b>,</a>
 which supports many more facilities for testing regular expressions and the
-PCRE library. The <b>pcredemo</b> program is provided as a simple coding
-example.
+PCRE library. The 
+<a href="pcredemo.html"><b>pcredemo</b></a>
+program is provided as a simple coding example.
 </P>
 <P>
-On some operating systems (e.g. Solaris), when PCRE is not installed in the
-standard library directory, you may get an error like this when you try to run
-<b>pcredemo</b>:
+When you try to run
+<a href="pcredemo.html"><b>pcredemo</b></a>
+when PCRE is not installed in the standard library directory, you may get an
+error like this on some operating systems (e.g. Solaris):
 <pre>
   ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
 </pre>
@@ -87,9 +93,9 @@
 REVISION
 </b><br>
 <P>
-Last updated: 23 January 2008
+Last updated: 01 September 2009
 <br>
-Copyright &copy; 1997-2008 University of Cambridge.
+Copyright &copy; 1997-2009 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.


Modified: code/trunk/doc/html/pcretest.html
===================================================================
--- code/trunk/doc/html/pcretest.html    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/html/pcretest.html    2009-09-01 16:10:16 UTC (rev 429)
@@ -372,7 +372,8 @@
   \M         discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
   \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
   \Odd       set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
-  \P         pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \P         pass the PCRE_PARTIAL_SOFT option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>; if used twice, pass the
+               PCRE_PARTIAL_HARD option 
   \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
   \R         pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b>
   \S         output details of memory get/free calls during matching
@@ -453,10 +454,10 @@
 <P>
 When a match succeeds, pcretest outputs the list of captured substrings that
 <b>pcre_exec()</b> returns, starting with number 0 for the string that matched
-the whole pattern. Otherwise, it outputs "No match" or "Partial match"
-when <b>pcre_exec()</b> returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
-respectively, and otherwise the PCRE negative error number. Here is an example
-of an interactive <b>pcretest</b> run.
+the whole pattern. Otherwise, it outputs "No match" or "Partial match:"
+followed by the partially matching substring when <b>pcre_exec()</b> returns
+PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL, respectively, and otherwise the PCRE
+negative error number. Here is an example of an interactive <b>pcretest</b> run.
 <pre>
   $ pcretest
   PCRE version 7.0 30-Nov-2006
@@ -536,7 +537,9 @@
    2: tan
 </pre>
 (Using the normal matching function on this data finds only "tang".) The
-longest matching string is always given first (and numbered zero).
+longest matching string is always given first (and numbered zero). After a
+PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the 
+partially matching substring.
 </P>
 <P>
 If <b>/g</b> is present on the pattern, the search for further matches resumes
@@ -703,7 +706,7 @@
 </P>
 <br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 10 March 2009
+Last updated: 29 August 2009
 <br>
 Copyright &copy; 1997-2009 University of Cambridge.
 <br>


Modified: code/trunk/doc/index.html.src
===================================================================
--- code/trunk/doc/index.html.src    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/index.html.src    2009-09-01 16:10:16 UTC (rev 429)
@@ -36,6 +36,9 @@
 <tr><td><a href="pcrecpp.html">pcrecpp</a></td>
     <td>&nbsp;&nbsp;The C++ wrapper for the PCRE library</td></tr>


+<tr><td><a href="pcredemo.html">pcredemo</a></td>
+    <td>&nbsp;&nbsp;A demonstration C program that uses the PCRE library</td></tr>
+
 <tr><td><a href="pcregrep.html">pcregrep</a></td>
     <td>&nbsp;&nbsp;The <b>pcregrep</b> command</td></tr>


@@ -58,7 +61,7 @@
     <td>&nbsp;&nbsp;How to save and re-use compiled patterns</td></tr>


 <tr><td><a href="pcresample.html">pcresample</a></td>
-    <td>&nbsp;&nbsp;Description of the sample program</td></tr>
+    <td>&nbsp;&nbsp;Discussion of the pcredemo program</td></tr>


 <tr><td><a href="pcrestack.html">pcrestack</a></td>
     <td>&nbsp;&nbsp;Discussion of PCRE's stack usage</td></tr>


Modified: code/trunk/doc/pcre.3
===================================================================
--- code/trunk/doc/pcre.3    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/pcre.3    2009-09-01 16:10:16 UTC (rev 429)
@@ -11,8 +11,8 @@
 support for certain .NET and Oniguruma syntax items, and there is an option for
 requesting some minor changes that give better JavaScript compatibility.
 .P
-The current implementation of PCRE (release 7.x) corresponds approximately with
-Perl 5.10, including support for UTF-8 encoded strings and Unicode general
+The current implementation of PCRE (release 8.xx) corresponds approximately
+with Perl 5.10, including support for UTF-8 encoded strings and Unicode general
 category properties. However, UTF-8 and Unicode support has to be explicitly
 enabled; it is not the default. The Unicode tables correspond to Unicode
 release 5.1.
@@ -83,8 +83,8 @@
 The user documentation for PCRE comprises a number of different sections. In
 the "man" format, each of these is a separate "man page". In the HTML format,
 each is a separate page, linked from the index page. In the plain text format,
-all the sections are concatenated, for ease of searching. The sections are as
-follows:
+all the sections, except the \fBpcredemo\fP section, are concatenated, for ease
+of searching. The sections are as follows:
 .sp
   pcre              this document
   pcre-config       show PCRE installation configuration information
@@ -93,6 +93,7 @@
   pcrecallout       details of the callout feature
   pcrecompat        discussion of Perl compatibility
   pcrecpp           details of the C++ wrapper
+  pcredemo          a demonstration C program that uses PCRE
   pcregrep          description of the \fBpcregrep\fP command
   pcrematching      discussion of the two matching algorithms
   pcrepartial       details of the partial matching facility
@@ -103,7 +104,7 @@
   pcreperform       discussion of performance issues
   pcreposix         the POSIX-compatible C API
   pcreprecompile    details of saving and re-using precompiled patterns
-  pcresample        discussion of the sample program
+  pcresample        discussion of the pcredemo program
   pcrestack         discussion of stack usage
   pcretest          description of the \fBpcretest\fP testing command
 .sp
@@ -291,6 +292,6 @@
 .rs
 .sp
 .nf
-Last updated: 11 April 2009
+Last updated: 01 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre.txt
===================================================================
--- code/trunk/doc/pcre.txt    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/pcre.txt    2009-09-01 16:10:16 UTC (rev 429)
@@ -2,8 +2,9 @@
 This file contains a concatenation of the PCRE man pages, converted to plain
 text format for ease of searching with a text editor, or for use on systems
 that do not have a man page processor. The small individual files that give
-synopses of each function in the library have not been included. There are
-separate text files for the pcregrep and pcretest commands.
+synopses of each function in the library have not been included. Neither has 
+the pcredemo program. There are separate text files for the pcregrep and
+pcretest commands.
 -----------------------------------------------------------------------------



@@ -24,7 +25,7 @@
        tax items, and there is an option for  requesting  some  minor  changes
        that give better JavaScript compatibility.


-       The  current  implementation of PCRE (release 7.x) corresponds approxi-
+       The  current implementation of PCRE (release 8.xx) corresponds approxi-
        mately with Perl 5.10, including support for UTF-8 encoded strings  and
        Unicode general category properties. However, UTF-8 and Unicode support
        has to be explicitly enabled; it is not the default. The Unicode tables
@@ -71,8 +72,9 @@
        The user documentation for PCRE comprises a number  of  different  sec-
        tions.  In the "man" format, each of these is a separate "man page". In
        the HTML format, each is a separate page, linked from the  index  page.
-       In  the  plain text format, all the sections are concatenated, for ease
-       of searching. The sections are as follows:
+       In  the  plain  text format, all the sections, except the pcredemo sec-
+       tion, are concatenated, for ease of searching. The sections are as fol-
+       lows:


          pcre              this document
          pcre-config       show PCRE installation configuration information
@@ -81,6 +83,7 @@
          pcrecallout       details of the callout feature
          pcrecompat        discussion of Perl compatibility
          pcrecpp           details of the C++ wrapper
+         pcredemo          a demonstration C program that uses PCRE
          pcregrep          description of the pcregrep command
          pcrematching      discussion of the two matching algorithms
          pcrepartial       details of the partial matching facility
@@ -90,25 +93,25 @@
          pcreperform       discussion of performance issues
          pcreposix         the POSIX-compatible C API
          pcreprecompile    details of saving and re-using precompiled patterns
-         pcresample        discussion of the sample program
+         pcresample        discussion of the pcredemo program
          pcrestack         discussion of stack usage
          pcretest          description of the pcretest testing command


-       In addition, in the "man" and HTML formats, there is a short  page  for
+       In  addition,  in the "man" and HTML formats, there is a short page for
        each C library function, listing its arguments and results.



LIMITATIONS

-       There  are some size limitations in PCRE but it is hoped that they will
+       There are some size limitations in PCRE but it is hoped that they  will
        never in practice be relevant.


-       The maximum length of a compiled pattern is 65539 (sic) bytes  if  PCRE
+       The  maximum  length of a compiled pattern is 65539 (sic) bytes if PCRE
        is compiled with the default internal linkage size of 2. If you want to
-       process regular expressions that are truly enormous,  you  can  compile
-       PCRE  with  an  internal linkage size of 3 or 4 (see the README file in
-       the source distribution and the pcrebuild documentation  for  details).
-       In  these  cases the limit is substantially larger.  However, the speed
+       process  regular  expressions  that are truly enormous, you can compile
+       PCRE with an internal linkage size of 3 or 4 (see the  README  file  in
+       the  source  distribution and the pcrebuild documentation for details).
+       In these cases the limit is substantially larger.  However,  the  speed
        of execution is slower.


        All values in repeating quantifiers must be less than 65536.
@@ -119,131 +122,131 @@
        The maximum length of name for a named subpattern is 32 characters, and
        the maximum number of named subpatterns is 10000.


-       The maximum length of a subject string is the largest  positive  number
-       that  an integer variable can hold. However, when using the traditional
+       The  maximum  length of a subject string is the largest positive number
+       that an integer variable can hold. However, when using the  traditional
        matching function, PCRE uses recursion to handle subpatterns and indef-
-       inite  repetition.  This means that the available stack space may limit
+       inite repetition.  This means that the available stack space may  limit
        the size of a subject string that can be processed by certain patterns.
        For a discussion of stack issues, see the pcrestack documentation.



UTF-8 AND UNICODE PROPERTY SUPPORT

-       From  release  3.3,  PCRE  has  had  some support for character strings
-       encoded in the UTF-8 format. For release 4.0 this was greatly  extended
-       to  cover  most common requirements, and in release 5.0 additional sup-
+       From release 3.3, PCRE has  had  some  support  for  character  strings
+       encoded  in the UTF-8 format. For release 4.0 this was greatly extended
+       to cover most common requirements, and in release 5.0  additional  sup-
        port for Unicode general category properties was added.


-       In order process UTF-8 strings, you must build PCRE  to  include  UTF-8
-       support  in  the  code,  and, in addition, you must call pcre_compile()
-       with the PCRE_UTF8 option flag, or the  pattern  must  start  with  the
-       sequence  (*UTF8).  When  either of these is the case, both the pattern
-       and any subject strings that are matched  against  it  are  treated  as
+       In  order  process  UTF-8 strings, you must build PCRE to include UTF-8
+       support in the code, and, in addition,  you  must  call  pcre_compile()
+       with  the  PCRE_UTF8  option  flag,  or the pattern must start with the
+       sequence (*UTF8). When either of these is the case,  both  the  pattern
+       and  any  subject  strings  that  are matched against it are treated as
        UTF-8 strings instead of just strings of bytes.


-       If  you compile PCRE with UTF-8 support, but do not use it at run time,
-       the library will be a bit bigger, but the additional run time  overhead
+       If you compile PCRE with UTF-8 support, but do not use it at run  time,
+       the  library will be a bit bigger, but the additional run time overhead
        is limited to testing the PCRE_UTF8 flag occasionally, so should not be
        very big.


        If PCRE is built with Unicode character property support (which implies
-       UTF-8  support),  the  escape sequences \p{..}, \P{..}, and \X are sup-
+       UTF-8 support), the escape sequences \p{..}, \P{..}, and  \X  are  sup-
        ported.  The available properties that can be tested are limited to the
-       general  category  properties such as Lu for an upper case letter or Nd
-       for a decimal number, the Unicode script names such as Arabic  or  Han,
-       and  the  derived  properties  Any  and L&. A full list is given in the
+       general category properties such as Lu for an upper case letter  or  Nd
+       for  a  decimal number, the Unicode script names such as Arabic or Han,
+       and the derived properties Any and L&. A full  list  is  given  in  the
        pcrepattern documentation. Only the short names for properties are sup-
-       ported.  For example, \p{L} matches a letter. Its Perl synonym, \p{Let-
-       ter}, is not supported.  Furthermore,  in  Perl,  many  properties  may
-       optionally  be  prefixed by "Is", for compatibility with Perl 5.6. PCRE
+       ported. For example, \p{L} matches a letter. Its Perl synonym,  \p{Let-
+       ter},  is  not  supported.   Furthermore,  in Perl, many properties may
+       optionally be prefixed by "Is", for compatibility with Perl  5.6.  PCRE
        does not support this.


    Validity of UTF-8 strings


-       When you set the PCRE_UTF8 flag, the strings  passed  as  patterns  and
+       When  you  set  the  PCRE_UTF8 flag, the strings passed as patterns and
        subjects are (by default) checked for validity on entry to the relevant
-       functions. From release 7.3 of PCRE, the check is according  the  rules
-       of  RFC  3629, which are themselves derived from the Unicode specifica-
-       tion. Earlier releases of PCRE followed the rules of  RFC  2279,  which
-       allows  the  full range of 31-bit values (0 to 0x7FFFFFFF). The current
+       functions.  From  release 7.3 of PCRE, the check is according the rules
+       of RFC 3629, which are themselves derived from the  Unicode  specifica-
+       tion.  Earlier  releases  of PCRE followed the rules of RFC 2279, which
+       allows the full range of 31-bit values (0 to 0x7FFFFFFF).  The  current
        check allows only values in the range U+0 to U+10FFFF, excluding U+D800
        to U+DFFF.


-       The  excluded  code  points are the "Low Surrogate Area" of Unicode, of
-       which the Unicode Standard says this: "The Low Surrogate Area does  not
-       contain  any  character  assignments,  consequently  no  character code
+       The excluded code points are the "Low Surrogate Area"  of  Unicode,  of
+       which  the Unicode Standard says this: "The Low Surrogate Area does not
+       contain any  character  assignments,  consequently  no  character  code
        charts or namelists are provided for this area. Surrogates are reserved
-       for  use  with  UTF-16 and then must be used in pairs." The code points
-       that are encoded by UTF-16 pairs  are  available  as  independent  code
-       points  in  the  UTF-8  encoding.  (In other words, the whole surrogate
+       for use with UTF-16 and then must be used in pairs."  The  code  points
+       that  are  encoded  by  UTF-16  pairs are available as independent code
+       points in the UTF-8 encoding. (In  other  words,  the  whole  surrogate
        thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)


-       If an  invalid  UTF-8  string  is  passed  to  PCRE,  an  error  return
+       If  an  invalid  UTF-8  string  is  passed  to  PCRE,  an  error return
        (PCRE_ERROR_BADUTF8) is given. In some situations, you may already know
        that your strings are valid, and therefore want to skip these checks in
        order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag at
-       compile time or at run time, PCRE assumes that the pattern  or  subject
-       it  is  given  (respectively)  contains only valid UTF-8 codes. In this
+       compile  time  or at run time, PCRE assumes that the pattern or subject
+       it is given (respectively) contains only valid  UTF-8  codes.  In  this
        case, it does not diagnose an invalid UTF-8 string.


-       If you pass an invalid UTF-8 string  when  PCRE_NO_UTF8_CHECK  is  set,
-       what  happens  depends on why the string is invalid. If the string con-
+       If  you  pass  an  invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set,
+       what happens depends on why the string is invalid. If the  string  con-
        forms to the "old" definition of UTF-8 (RFC 2279), it is processed as a
-       string  of  characters  in  the  range 0 to 0x7FFFFFFF. In other words,
+       string of characters in the range 0  to  0x7FFFFFFF.  In  other  words,
        apart from the initial validity test, PCRE (when in UTF-8 mode) handles
-       strings  according  to  the more liberal rules of RFC 2279. However, if
-       the string does not even conform to RFC 2279, the result is  undefined.
+       strings according to the more liberal rules of RFC  2279.  However,  if
+       the  string does not even conform to RFC 2279, the result is undefined.
        Your program may crash.


-       If  you  want  to  process  strings  of  values  in the full range 0 to
-       0x7FFFFFFF, encoded in a UTF-8-like manner as per the old RFC, you  can
+       If you want to process strings  of  values  in  the  full  range  0  to
+       0x7FFFFFFF,  encoded in a UTF-8-like manner as per the old RFC, you can
        set PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in
        this situation, you will have to apply your own validity check.


    General comments about UTF-8 mode


-       1. An unbraced hexadecimal escape sequence (such  as  \xb3)  matches  a
+       1.  An  unbraced  hexadecimal  escape sequence (such as \xb3) matches a
        two-byte UTF-8 character if the value is greater than 127.


-       2.  Octal  numbers  up to \777 are recognized, and match two-byte UTF-8
+       2. Octal numbers up to \777 are recognized, and  match  two-byte  UTF-8
        characters for values greater than \177.


-       3. Repeat quantifiers apply to complete UTF-8 characters, not to  indi-
+       3.  Repeat quantifiers apply to complete UTF-8 characters, not to indi-
        vidual bytes, for example: \x{100}{3}.


-       4.  The dot metacharacter matches one UTF-8 character instead of a sin-
+       4. The dot metacharacter matches one UTF-8 character instead of a  sin-
        gle byte.


-       5. The escape sequence \C can be used to match a single byte  in  UTF-8
-       mode,  but  its  use can lead to some strange effects. This facility is
+       5.  The  escape sequence \C can be used to match a single byte in UTF-8
+       mode, but its use can lead to some strange effects.  This  facility  is
        not available in the alternative matching function, pcre_dfa_exec().


-       6. The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly
-       test  characters of any code value, but the characters that PCRE recog-
-       nizes as digits, spaces, or word characters  remain  the  same  set  as
+       6.  The  character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
+       test characters of any code value, but the characters that PCRE  recog-
+       nizes  as  digits,  spaces,  or  word characters remain the same set as
        before, all with values less than 256. This remains true even when PCRE
-       includes Unicode property support, because to do otherwise  would  slow
-       down  PCRE in many common cases. If you really want to test for a wider
-       sense of, say, "digit", you must use Unicode  property  tests  such  as
-       \p{Nd}.  Note  that  this  also applies to \b, because it is defined in
+       includes  Unicode  property support, because to do otherwise would slow
+       down PCRE in many common cases. If you really want to test for a  wider
+       sense  of,  say,  "digit",  you must use Unicode property tests such as
+       \p{Nd}. Note that this also applies to \b, because  it  is  defined  in
        terms of \w and \W.


-       7. Similarly, characters that match the POSIX named  character  classes
+       7.  Similarly,  characters that match the POSIX named character classes
        are all low-valued characters.


-       8.  However,  the Perl 5.10 horizontal and vertical whitespace matching
+       8. However, the Perl 5.10 horizontal and vertical  whitespace  matching
        escapes (\h, \H, \v, and \V) do match all the appropriate Unicode char-
        acters.


-       9.  Case-insensitive  matching  applies only to characters whose values
-       are less than 128, unless PCRE is built with Unicode property  support.
-       Even  when  Unicode  property support is available, PCRE still uses its
-       own character tables when checking the case of  low-valued  characters,
-       so  as not to degrade performance.  The Unicode property information is
+       9. Case-insensitive matching applies only to  characters  whose  values
+       are  less than 128, unless PCRE is built with Unicode property support.
+       Even when Unicode property support is available, PCRE  still  uses  its
+       own  character  tables when checking the case of low-valued characters,
+       so as not to degrade performance.  The Unicode property information  is
        used only for characters with higher values. Even when Unicode property
        support is available, PCRE supports case-insensitive matching only when
-       there is a one-to-one mapping between a letter's  cases.  There  are  a
-       small  number  of  many-to-one  mappings in Unicode; these are not sup-
+       there  is  a  one-to-one  mapping between a letter's cases. There are a
+       small number of many-to-one mappings in Unicode;  these  are  not  sup-
        ported by PCRE.



@@ -253,18 +256,18 @@
        University Computing Service
        Cambridge CB2 3QH, England.


-       Putting an actual email address here seems to have been a spam  magnet,
-       so  I've  taken  it away. If you want to email me, use my two initials,
+       Putting  an actual email address here seems to have been a spam magnet,
+       so I've taken it away. If you want to email me, use  my  two  initials,
        followed by the two digits 10, at the domain cam.ac.uk.



REVISION

-       Last updated: 11 April 2009
+       Last updated: 01 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCREBUILD(3)                                                      PCREBUILD(3)



@@ -590,8 +593,8 @@
        Last updated: 17 March 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCREMATCHING(3)                                                PCREMATCHING(3)



@@ -751,13 +754,7 @@
        more than one match using the standard algorithm, you have to do kludgy
        things with callouts.


-       2.  There is much better support for partial matching. The restrictions
-       on the content of the pattern that apply when using the standard  algo-
-       rithm  for  partial matching do not apply to the alternative algorithm.
-       For non-anchored patterns, the starting position of a partial match  is
-       available.
-
-       3.  Because  the  alternative  algorithm  scans the subject string just
+       2.  Because  the  alternative  algorithm  scans the subject string just
        once, and never needs to backtrack, it is possible to  pass  very  long
        subject  strings  to  the matching function in several pieces, checking
        for partial matching each time.
@@ -786,11 +783,11 @@


REVISION

-       Last updated: 19 April 2008
-       Copyright (c) 1997-2008 University of Cambridge.
+       Last updated: 25 August 2009
+       Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCREAPI(3)                                                          PCREAPI(3)



@@ -898,18 +895,19 @@
        pcre_exec() are used for compiling and matching regular expressions  in
        a  Perl-compatible  manner. A sample program that demonstrates the sim-
        plest way of using them is provided in the file  called  pcredemo.c  in
-       the  source distribution. The pcresample documentation describes how to
-       compile and run it.
+       the PCRE source distribution. A listing of this program is given in the
+       pcredemo documentation, and the pcresample documentation describes  how
+       to compile and run it.


        A second matching function, pcre_dfa_exec(), which is not Perl-compati-
-       ble,  is  also provided. This uses a different algorithm for the match-
-       ing. The alternative algorithm finds all possible matches (at  a  given
-       point  in  the subject), and scans the subject just once. However, this
+       ble, is also provided. This uses a different algorithm for  the  match-
+       ing.  The  alternative algorithm finds all possible matches (at a given
+       point in the subject), and scans the subject just once.  However,  this
        algorithm does not return captured substrings. A description of the two
-       matching  algorithms and their advantages and disadvantages is given in
+       matching algorithms and their advantages and disadvantages is given  in
        the pcrematching documentation.


-       In addition to the main compiling and  matching  functions,  there  are
+       In  addition  to  the  main compiling and matching functions, there are
        convenience functions for extracting captured substrings from a subject
        string that is matched by pcre_exec(). They are:


@@ -924,91 +922,91 @@
        pcre_free_substring() and pcre_free_substring_list() are also provided,
        to free the memory used for extracted strings.


-       The  function  pcre_maketables()  is  used  to build a set of character
-       tables  in  the  current  locale   for   passing   to   pcre_compile(),
-       pcre_exec(),  or  pcre_dfa_exec(). This is an optional facility that is
-       provided for specialist use.  Most  commonly,  no  special  tables  are
-       passed,  in  which case internal tables that are generated when PCRE is
+       The function pcre_maketables() is used to  build  a  set  of  character
+       tables   in   the   current   locale  for  passing  to  pcre_compile(),
+       pcre_exec(), or pcre_dfa_exec(). This is an optional facility  that  is
+       provided  for  specialist  use.  Most  commonly,  no special tables are
+       passed, in which case internal tables that are generated when  PCRE  is
        built are used.


-       The function pcre_fullinfo() is used to find out  information  about  a
-       compiled  pattern; pcre_info() is an obsolete version that returns only
-       some of the available information, but is retained for  backwards  com-
-       patibility.   The function pcre_version() returns a pointer to a string
+       The  function  pcre_fullinfo()  is used to find out information about a
+       compiled pattern; pcre_info() is an obsolete version that returns  only
+       some  of  the available information, but is retained for backwards com-
+       patibility.  The function pcre_version() returns a pointer to a  string
        containing the version of PCRE and its date of release.


-       The function pcre_refcount() maintains a  reference  count  in  a  data
-       block  containing  a compiled pattern. This is provided for the benefit
+       The  function  pcre_refcount()  maintains  a  reference count in a data
+       block containing a compiled pattern. This is provided for  the  benefit
        of object-oriented applications.


-       The global variables pcre_malloc and pcre_free  initially  contain  the
-       entry  points  of  the  standard malloc() and free() functions, respec-
+       The  global  variables  pcre_malloc and pcre_free initially contain the
+       entry points of the standard malloc()  and  free()  functions,  respec-
        tively. PCRE calls the memory management functions via these variables,
-       so  a  calling  program  can replace them if it wishes to intercept the
+       so a calling program can replace them if it  wishes  to  intercept  the
        calls. This should be done before calling any PCRE functions.


-       The global variables pcre_stack_malloc  and  pcre_stack_free  are  also
-       indirections  to  memory  management functions. These special functions
-       are used only when PCRE is compiled to use  the  heap  for  remembering
+       The  global  variables  pcre_stack_malloc  and pcre_stack_free are also
+       indirections to memory management functions.  These  special  functions
+       are  used  only  when  PCRE is compiled to use the heap for remembering
        data, instead of recursive function calls, when running the pcre_exec()
-       function. See the pcrebuild documentation for  details  of  how  to  do
-       this.  It  is  a non-standard way of building PCRE, for use in environ-
-       ments that have limited stacks. Because of the greater  use  of  memory
-       management,  it  runs  more  slowly. Separate functions are provided so
-       that special-purpose external code can be  used  for  this  case.  When
-       used,  these  functions  are always called in a stack-like manner (last
-       obtained, first freed), and always for memory blocks of the same  size.
-       There  is  a discussion about PCRE's stack usage in the pcrestack docu-
+       function.  See  the  pcrebuild  documentation  for details of how to do
+       this. It is a non-standard way of building PCRE, for  use  in  environ-
+       ments  that  have  limited stacks. Because of the greater use of memory
+       management, it runs more slowly. Separate  functions  are  provided  so
+       that  special-purpose  external  code  can  be used for this case. When
+       used, these functions are always called in a  stack-like  manner  (last
+       obtained,  first freed), and always for memory blocks of the same size.
+       There is a discussion about PCRE's stack usage in the  pcrestack  docu-
        mentation.


        The global variable pcre_callout initially contains NULL. It can be set
-       by  the  caller  to  a "callout" function, which PCRE will then call at
-       specified points during a matching operation. Details are given in  the
+       by the caller to a "callout" function, which PCRE  will  then  call  at
+       specified  points during a matching operation. Details are given in the
        pcrecallout documentation.



NEWLINES

-       PCRE  supports five different conventions for indicating line breaks in
-       strings: a single CR (carriage return) character, a  single  LF  (line-
+       PCRE supports five different conventions for indicating line breaks  in
+       strings:  a  single  CR (carriage return) character, a single LF (line-
        feed) character, the two-character sequence CRLF, any of the three pre-
-       ceding, or any Unicode newline sequence. The Unicode newline  sequences
-       are  the  three just mentioned, plus the single characters VT (vertical
-       tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS  (line
+       ceding,  or any Unicode newline sequence. The Unicode newline sequences
+       are the three just mentioned, plus the single characters  VT  (vertical
+       tab,  U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
        separator, U+2028), and PS (paragraph separator, U+2029).


-       Each  of  the first three conventions is used by at least one operating
-       system as its standard newline sequence. When PCRE is built, a  default
-       can  be  specified.  The default default is LF, which is the Unix stan-
-       dard. When PCRE is run, the default can be overridden,  either  when  a
+       Each of the first three conventions is used by at least  one  operating
+       system  as its standard newline sequence. When PCRE is built, a default
+       can be specified.  The default default is LF, which is the  Unix  stan-
+       dard.  When  PCRE  is run, the default can be overridden, either when a
        pattern is compiled, or when it is matched.


        At compile time, the newline convention can be specified by the options
-       argument of pcre_compile(), or it can be specified by special  text  at
+       argument  of  pcre_compile(), or it can be specified by special text at
        the start of the pattern itself; this overrides any other settings. See
        the pcrepattern page for details of the special character sequences.


        In the PCRE documentation the word "newline" is used to mean "the char-
-       acter  or pair of characters that indicate a line break". The choice of
-       newline convention affects the handling of  the  dot,  circumflex,  and
+       acter or pair of characters that indicate a line break". The choice  of
+       newline  convention  affects  the  handling of the dot, circumflex, and
        dollar metacharacters, the handling of #-comments in /x mode, and, when
-       CRLF is a recognized line ending sequence, the match position  advance-
+       CRLF  is a recognized line ending sequence, the match position advance-
        ment for a non-anchored pattern. There is more detail about this in the
        section on pcre_exec() options below.


-       The choice of newline convention does not affect the interpretation  of
-       the  \n  or  \r  escape  sequences, nor does it affect what \R matches,
+       The  choice of newline convention does not affect the interpretation of
+       the \n or \r escape sequences, nor does  it  affect  what  \R  matches,
        which is controlled in a similar way, but by separate options.



MULTITHREADING

-       The PCRE functions can be used in  multi-threading  applications,  with
+       The  PCRE  functions  can be used in multi-threading applications, with
        the  proviso  that  the  memory  management  functions  pointed  to  by
        pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the
        callout function pointed to by pcre_callout, are shared by all threads.


-       The  compiled form of a regular expression is not altered during match-
+       The compiled form of a regular expression is not altered during  match-
        ing, so the same compiled pattern can safely be used by several threads
        at once.


@@ -1016,10 +1014,10 @@
SAVING PRECOMPILED PATTERNS FOR LATER USE

        The compiled form of a regular expression can be saved and re-used at a
-       later time, possibly by a different program, and even on a  host  other
-       than  the  one  on  which  it  was  compiled.  Details are given in the
-       pcreprecompile documentation. However, compiling a  regular  expression
-       with  one version of PCRE for use with a different version is not guar-
+       later  time,  possibly by a different program, and even on a host other
+       than the one on which  it  was  compiled.  Details  are  given  in  the
+       pcreprecompile  documentation.  However, compiling a regular expression
+       with one version of PCRE for use with a different version is not  guar-
        anteed to work and may cause crashes.



@@ -1027,79 +1025,79 @@

        int pcre_config(int what, void *where);


-       The function pcre_config() makes it possible for a PCRE client to  dis-
+       The  function pcre_config() makes it possible for a PCRE client to dis-
        cover which optional features have been compiled into the PCRE library.
-       The pcrebuild documentation has more details about these optional  fea-
+       The  pcrebuild documentation has more details about these optional fea-
        tures.


-       The  first  argument  for pcre_config() is an integer, specifying which
+       The first argument for pcre_config() is an  integer,  specifying  which
        information is required; the second argument is a pointer to a variable
-       into  which  the  information  is  placed. The following information is
+       into which the information is  placed.  The  following  information  is
        available:


          PCRE_CONFIG_UTF8


-       The output is an integer that is set to one if UTF-8 support is  avail-
+       The  output is an integer that is set to one if UTF-8 support is avail-
        able; otherwise it is set to zero.


          PCRE_CONFIG_UNICODE_PROPERTIES


-       The  output  is  an  integer  that is set to one if support for Unicode
+       The output is an integer that is set to  one  if  support  for  Unicode
        character properties is available; otherwise it is set to zero.


          PCRE_CONFIG_NEWLINE


-       The output is an integer whose value specifies  the  default  character
-       sequence  that is recognized as meaning "newline". The four values that
+       The  output  is  an integer whose value specifies the default character
+       sequence that is recognized as meaning "newline". The four values  that
        are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
-       and  -1  for  ANY.  Though they are derived from ASCII, the same values
+       and -1 for ANY.  Though they are derived from ASCII,  the  same  values
        are returned in EBCDIC environments. The default should normally corre-
        spond to the standard sequence for your operating system.


          PCRE_CONFIG_BSR


        The output is an integer whose value indicates what character sequences
-       the \R escape sequence matches by default. A value of 0 means  that  \R
-       matches  any  Unicode  line ending sequence; a value of 1 means that \R
+       the  \R  escape sequence matches by default. A value of 0 means that \R
+       matches any Unicode line ending sequence; a value of 1  means  that  \R
        matches only CR, LF, or CRLF. The default can be overridden when a pat-
        tern is compiled or matched.


          PCRE_CONFIG_LINK_SIZE


-       The  output  is  an  integer that contains the number of bytes used for
+       The output is an integer that contains the number  of  bytes  used  for
        internal linkage in compiled regular expressions. The value is 2, 3, or
-       4.  Larger  values  allow larger regular expressions to be compiled, at
-       the expense of slower matching. The default value of  2  is  sufficient
-       for  all  but  the  most massive patterns, since it allows the compiled
+       4. Larger values allow larger regular expressions to  be  compiled,  at
+       the  expense  of  slower matching. The default value of 2 is sufficient
+       for all but the most massive patterns, since  it  allows  the  compiled
        pattern to be up to 64K in size.


          PCRE_CONFIG_POSIX_MALLOC_THRESHOLD


-       The output is an integer that contains the threshold  above  which  the
-       POSIX  interface  uses malloc() for output vectors. Further details are
+       The  output  is  an integer that contains the threshold above which the
+       POSIX interface uses malloc() for output vectors. Further  details  are
        given in the pcreposix documentation.


          PCRE_CONFIG_MATCH_LIMIT


-       The output is a long integer that gives the default limit for the  num-
-       ber  of  internal  matching  function calls in a pcre_exec() execution.
+       The  output is a long integer that gives the default limit for the num-
+       ber of internal matching function calls  in  a  pcre_exec()  execution.
        Further details are given with pcre_exec() below.


          PCRE_CONFIG_MATCH_LIMIT_RECURSION


        The output is a long integer that gives the default limit for the depth
-       of   recursion  when  calling  the  internal  matching  function  in  a
-       pcre_exec() execution.  Further  details  are  given  with  pcre_exec()
+       of  recursion  when  calling  the  internal  matching  function  in   a
+       pcre_exec()  execution.  Further  details  are  given  with pcre_exec()
        below.


          PCRE_CONFIG_STACKRECURSE


-       The  output is an integer that is set to one if internal recursion when
+       The output is an integer that is set to one if internal recursion  when
        running pcre_exec() is implemented by recursive function calls that use
-       the  stack  to remember their state. This is the usual way that PCRE is
+       the stack to remember their state. This is the usual way that  PCRE  is
        compiled. The output is zero if PCRE was compiled to use blocks of data
-       on  the  heap  instead  of  recursive  function  calls.  In  this case,
-       pcre_stack_malloc and  pcre_stack_free  are  called  to  manage  memory
+       on the  heap  instead  of  recursive  function  calls.  In  this  case,
+       pcre_stack_malloc  and  pcre_stack_free  are  called  to  manage memory
        blocks on the heap, thus avoiding the use of the stack.



@@ -1116,56 +1114,56 @@

        Either of the functions pcre_compile() or pcre_compile2() can be called
        to compile a pattern into an internal form. The only difference between
-       the  two interfaces is that pcre_compile2() has an additional argument,
+       the two interfaces is that pcre_compile2() has an additional  argument,
        errorcodeptr, via which a numerical error code can be returned.


        The pattern is a C string terminated by a binary zero, and is passed in
-       the  pattern  argument.  A  pointer to a single block of memory that is
-       obtained via pcre_malloc is returned. This contains the  compiled  code
+       the pattern argument. A pointer to a single block  of  memory  that  is
+       obtained  via  pcre_malloc is returned. This contains the compiled code
        and related data. The pcre type is defined for the returned block; this
        is a typedef for a structure whose contents are not externally defined.
        It is up to the caller to free the memory (via pcre_free) when it is no
        longer required.


-       Although the compiled code of a PCRE regex is relocatable, that is,  it
+       Although  the compiled code of a PCRE regex is relocatable, that is, it
        does not depend on memory location, the complete pcre data block is not
-       fully relocatable, because it may contain a copy of the tableptr  argu-
+       fully  relocatable, because it may contain a copy of the tableptr argu-
        ment, which is an address (see below).


        The options argument contains various bit settings that affect the com-
-       pilation. It should be zero if no options are required.  The  available
-       options  are  described  below. Some of them (in particular, those that
-       are compatible with Perl, but also some others) can  also  be  set  and
-       unset  from  within  the  pattern  (see the detailed description in the
-       pcrepattern documentation). For those options that can be different  in
-       different  parts  of  the pattern, the contents of the options argument
+       pilation.  It  should be zero if no options are required. The available
+       options are described below. Some of them (in  particular,  those  that
+       are  compatible  with  Perl,  but also some others) can also be set and
+       unset from within the pattern (see  the  detailed  description  in  the
+       pcrepattern  documentation). For those options that can be different in
+       different parts of the pattern, the contents of  the  options  argument
        specifies their initial settings at the start of compilation and execu-
-       tion.  The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the
+       tion. The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at  the
        time of matching as well as at compile time.


        If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
-       if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
+       if compilation of a pattern fails,  pcre_compile()  returns  NULL,  and
        sets the variable pointed to by errptr to point to a textual error mes-
        sage. This is a static string that is part of the library. You must not
        try to free it. The offset from the start of the pattern to the charac-
        ter where the error was discovered is placed in the variable pointed to
-       by erroffset, which must not be NULL. If it is, an immediate  error  is
+       by  erroffset,  which must not be NULL. If it is, an immediate error is
        given.


-       If  pcre_compile2()  is  used instead of pcre_compile(), and the error-
-       codeptr argument is not NULL, a non-zero error code number is  returned
-       via  this argument in the event of an error. This is in addition to the
+       If pcre_compile2() is used instead of pcre_compile(),  and  the  error-
+       codeptr  argument is not NULL, a non-zero error code number is returned
+       via this argument in the event of an error. This is in addition to  the
        textual error message. Error codes and messages are listed below.


-       If the final argument, tableptr, is NULL, PCRE uses a  default  set  of
-       character  tables  that  are  built  when  PCRE  is compiled, using the
-       default C locale. Otherwise, tableptr must be an address  that  is  the
-       result  of  a  call to pcre_maketables(). This value is stored with the
-       compiled pattern, and used again by pcre_exec(), unless  another  table
+       If  the  final  argument, tableptr, is NULL, PCRE uses a default set of
+       character tables that are  built  when  PCRE  is  compiled,  using  the
+       default  C  locale.  Otherwise, tableptr must be an address that is the
+       result of a call to pcre_maketables(). This value is  stored  with  the
+       compiled  pattern,  and used again by pcre_exec(), unless another table
        pointer is passed to it. For more discussion, see the section on locale
        support below.


-       This code fragment shows a typical straightforward  call  to  pcre_com-
+       This  code  fragment  shows a typical straightforward call to pcre_com-
        pile():


          pcre *re;
@@ -1178,137 +1176,137 @@
            &erroffset,       /* for error offset */
            NULL);            /* use default character tables */


-       The  following  names  for option bits are defined in the pcre.h header
+       The following names for option bits are defined in  the  pcre.h  header
        file:


          PCRE_ANCHORED


        If this bit is set, the pattern is forced to be "anchored", that is, it
-       is  constrained to match only at the first matching point in the string
-       that is being searched (the "subject string"). This effect can also  be
-       achieved  by appropriate constructs in the pattern itself, which is the
+       is constrained to match only at the first matching point in the  string
+       that  is being searched (the "subject string"). This effect can also be
+       achieved by appropriate constructs in the pattern itself, which is  the
        only way to do it in Perl.


          PCRE_AUTO_CALLOUT


        If this bit is set, pcre_compile() automatically inserts callout items,
-       all  with  number  255, before each pattern item. For discussion of the
+       all with number 255, before each pattern item. For  discussion  of  the
        callout facility, see the pcrecallout documentation.


          PCRE_BSR_ANYCRLF
          PCRE_BSR_UNICODE


        These options (which are mutually exclusive) control what the \R escape
-       sequence  matches.  The choice is either to match only CR, LF, or CRLF,
+       sequence matches. The choice is either to match only CR, LF,  or  CRLF,
        or to match any Unicode newline sequence. The default is specified when
        PCRE is built. It can be overridden from within the pattern, or by set-
        ting an option when a compiled pattern is matched.


          PCRE_CASELESS


-       If this bit is set, letters in the pattern match both upper  and  lower
-       case  letters.  It  is  equivalent  to  Perl's /i option, and it can be
-       changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE
-       always  understands the concept of case for characters whose values are
-       less than 128, so caseless matching is always possible. For  characters
-       with  higher  values,  the concept of case is supported if PCRE is com-
-       piled with Unicode property support, but not otherwise. If you want  to
-       use  caseless  matching  for  characters 128 and above, you must ensure
-       that PCRE is compiled with Unicode property support  as  well  as  with
+       If  this  bit is set, letters in the pattern match both upper and lower
+       case letters. It is equivalent to Perl's  /i  option,  and  it  can  be
+       changed  within a pattern by a (?i) option setting. In UTF-8 mode, PCRE
+       always understands the concept of case for characters whose values  are
+       less  than 128, so caseless matching is always possible. For characters
+       with higher values, the concept of case is supported if  PCRE  is  com-
+       piled  with Unicode property support, but not otherwise. If you want to
+       use caseless matching for characters 128 and  above,  you  must  ensure
+       that  PCRE  is  compiled  with Unicode property support as well as with
        UTF-8 support.


          PCRE_DOLLAR_ENDONLY


-       If  this bit is set, a dollar metacharacter in the pattern matches only
-       at the end of the subject string. Without this option,  a  dollar  also
-       matches  immediately before a newline at the end of the string (but not
-       before any other newlines). The PCRE_DOLLAR_ENDONLY option  is  ignored
-       if  PCRE_MULTILINE  is  set.   There is no equivalent to this option in
+       If this bit is set, a dollar metacharacter in the pattern matches  only
+       at  the  end  of the subject string. Without this option, a dollar also
+       matches immediately before a newline at the end of the string (but  not
+       before  any  other newlines). The PCRE_DOLLAR_ENDONLY option is ignored
+       if PCRE_MULTILINE is set.  There is no equivalent  to  this  option  in
        Perl, and no way to set it within a pattern.


          PCRE_DOTALL


        If this bit is set, a dot metacharater in the pattern matches all char-
-       acters,  including  those that indicate newline. Without it, a dot does
-       not match when the current position is at a  newline.  This  option  is
-       equivalent  to Perl's /s option, and it can be changed within a pattern
-       by a (?s) option setting. A negative class such as [^a] always  matches
+       acters, including those that indicate newline. Without it, a  dot  does
+       not  match  when  the  current position is at a newline. This option is
+       equivalent to Perl's /s option, and it can be changed within a  pattern
+       by  a (?s) option setting. A negative class such as [^a] always matches
        newline characters, independent of the setting of this option.


          PCRE_DUPNAMES


-       If  this  bit is set, names used to identify capturing subpatterns need
+       If this bit is set, names used to identify capturing  subpatterns  need
        not be unique. This can be helpful for certain types of pattern when it
-       is  known  that  only  one instance of the named subpattern can ever be
-       matched. There are more details of named subpatterns  below;  see  also
+       is known that only one instance of the named  subpattern  can  ever  be
+       matched.  There  are  more details of named subpatterns below; see also
        the pcrepattern documentation.


          PCRE_EXTENDED


-       If  this  bit  is  set,  whitespace  data characters in the pattern are
+       If this bit is set, whitespace  data  characters  in  the  pattern  are
        totally ignored except when escaped or inside a character class. White-
        space does not include the VT character (code 11). In addition, charac-
        ters between an unescaped # outside a character class and the next new-
-       line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x
-       option, and it can be changed within a pattern by a  (?x)  option  set-
+       line, inclusive, are also ignored. This  is  equivalent  to  Perl's  /x
+       option,  and  it  can be changed within a pattern by a (?x) option set-
        ting.


-       This  option  makes  it possible to include comments inside complicated
-       patterns.  Note, however, that this applies only  to  data  characters.
-       Whitespace   characters  may  never  appear  within  special  character
-       sequences in a pattern, for  example  within  the  sequence  (?(  which
+       This option makes it possible to include  comments  inside  complicated
+       patterns.   Note,  however,  that this applies only to data characters.
+       Whitespace  characters  may  never  appear  within  special   character
+       sequences  in  a  pattern,  for  example  within the sequence (?( which
        introduces a conditional subpattern.


          PCRE_EXTRA


-       This  option  was invented in order to turn on additional functionality
-       of PCRE that is incompatible with Perl, but it  is  currently  of  very
-       little  use. When set, any backslash in a pattern that is followed by a
-       letter that has no special meaning  causes  an  error,  thus  reserving
-       these  combinations  for  future  expansion.  By default, as in Perl, a
-       backslash followed by a letter with no special meaning is treated as  a
-       literal.  (Perl can, however, be persuaded to give a warning for this.)
-       There are at present no other features controlled by  this  option.  It
+       This option was invented in order to turn on  additional  functionality
+       of  PCRE  that  is  incompatible with Perl, but it is currently of very
+       little use. When set, any backslash in a pattern that is followed by  a
+       letter  that  has  no  special  meaning causes an error, thus reserving
+       these combinations for future expansion. By  default,  as  in  Perl,  a
+       backslash  followed by a letter with no special meaning is treated as a
+       literal. (Perl can, however, be persuaded to give a warning for  this.)
+       There  are  at  present no other features controlled by this option. It
        can also be set by a (?X) option setting within a pattern.


          PCRE_FIRSTLINE


-       If  this  option  is  set,  an  unanchored pattern is required to match
-       before or at the first  newline  in  the  subject  string,  though  the
+       If this option is set, an  unanchored  pattern  is  required  to  match
+       before  or  at  the  first  newline  in  the subject string, though the
        matched text may continue over the newline.


          PCRE_JAVASCRIPT_COMPAT


        If this option is set, PCRE's behaviour is changed in some ways so that
-       it is compatible with JavaScript rather than Perl. The changes  are  as
+       it  is  compatible with JavaScript rather than Perl. The changes are as
        follows:


-       (1)  A  lone  closing square bracket in a pattern causes a compile-time
-       error, because this is illegal in JavaScript (by default it is  treated
+       (1) A lone closing square bracket in a pattern  causes  a  compile-time
+       error,  because this is illegal in JavaScript (by default it is treated
        as a data character). Thus, the pattern AB]CD becomes illegal when this
        option is set.


-       (2) At run time, a back reference to an unset subpattern group  matches
-       an  empty  string (by default this causes the current matching alterna-
-       tive to fail). A pattern such as (\1)(a) succeeds when this  option  is
-       set  (assuming  it can find an "a" in the subject), whereas it fails by
+       (2)  At run time, a back reference to an unset subpattern group matches
+       an empty string (by default this causes the current  matching  alterna-
+       tive  to  fail). A pattern such as (\1)(a) succeeds when this option is
+       set (assuming it can find an "a" in the subject), whereas it  fails  by
        default, for Perl compatibility.


          PCRE_MULTILINE


-       By default, PCRE treats the subject string as consisting  of  a  single
-       line  of characters (even if it actually contains newlines). The "start
-       of line" metacharacter (^) matches only at the  start  of  the  string,
-       while  the  "end  of line" metacharacter ($) matches only at the end of
+       By  default,  PCRE  treats the subject string as consisting of a single
+       line of characters (even if it actually contains newlines). The  "start
+       of  line"  metacharacter  (^)  matches only at the start of the string,
+       while the "end of line" metacharacter ($) matches only at  the  end  of
        the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
        is set). This is the same as Perl.


-       When  PCRE_MULTILINE  it  is set, the "start of line" and "end of line"
-       constructs match immediately following or immediately  before  internal
-       newlines  in  the  subject string, respectively, as well as at the very
-       start and end. This is equivalent to Perl's /m option, and  it  can  be
+       When PCRE_MULTILINE it is set, the "start of line" and  "end  of  line"
+       constructs  match  immediately following or immediately before internal
+       newlines in the subject string, respectively, as well as  at  the  very
+       start  and  end.  This is equivalent to Perl's /m option, and it can be
        changed within a pattern by a (?m) option setting. If there are no new-
-       lines in a subject string, or no occurrences of ^ or $  in  a  pattern,
+       lines  in  a  subject string, or no occurrences of ^ or $ in a pattern,
        setting PCRE_MULTILINE has no effect.


          PCRE_NEWLINE_CR
@@ -1317,32 +1315,32 @@
          PCRE_NEWLINE_ANYCRLF
          PCRE_NEWLINE_ANY


-       These  options  override the default newline definition that was chosen
-       when PCRE was built. Setting the first or the second specifies  that  a
-       newline  is  indicated  by a single character (CR or LF, respectively).
-       Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the
-       two-character  CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF specifies
+       These options override the default newline definition that  was  chosen
+       when  PCRE  was built. Setting the first or the second specifies that a
+       newline is indicated by a single character (CR  or  LF,  respectively).
+       Setting  PCRE_NEWLINE_CRLF specifies that a newline is indicated by the
+       two-character CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF  specifies
        that any of the three preceding sequences should be recognized. Setting
-       PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be
+       PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should  be
        recognized. The Unicode newline sequences are the three just mentioned,
-       plus  the  single  characters  VT (vertical tab, U+000B), FF (formfeed,
-       U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS
-       (paragraph  separator,  U+2029).  The  last  two are recognized only in
+       plus the single characters VT (vertical  tab,  U+000B),  FF  (formfeed,
+       U+000C),  NEL  (next line, U+0085), LS (line separator, U+2028), and PS
+       (paragraph separator, U+2029). The last  two  are  recognized  only  in
        UTF-8 mode.


-       The newline setting in the  options  word  uses  three  bits  that  are
+       The  newline  setting  in  the  options  word  uses three bits that are
        treated as a number, giving eight possibilities. Currently only six are
-       used (default plus the five values above). This means that if  you  set
-       more  than one newline option, the combination may or may not be sensi-
+       used  (default  plus the five values above). This means that if you set
+       more than one newline option, the combination may or may not be  sensi-
        ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
-       PCRE_NEWLINE_CRLF,  but other combinations may yield unused numbers and
+       PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers  and
        cause an error.


-       The only time that a line break is specially recognized when  compiling
-       a  pattern  is  if  PCRE_EXTENDED  is set, and an unescaped # outside a
-       character class is encountered. This indicates  a  comment  that  lasts
-       until  after the next line break sequence. In other circumstances, line
-       break  sequences  are  treated  as  literal  data,   except   that   in
+       The  only time that a line break is specially recognized when compiling
+       a pattern is if PCRE_EXTENDED is set, and  an  unescaped  #  outside  a
+       character  class  is  encountered.  This indicates a comment that lasts
+       until after the next line break sequence. In other circumstances,  line
+       break   sequences   are   treated  as  literal  data,  except  that  in
        PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters
        and are therefore ignored.


@@ -1352,46 +1350,46 @@
          PCRE_NO_AUTO_CAPTURE


        If this option is set, it disables the use of numbered capturing paren-
-       theses in the pattern. Any opening parenthesis that is not followed  by
-       ?  behaves as if it were followed by ?: but named parentheses can still
-       be used for capturing (and they acquire  numbers  in  the  usual  way).
+       theses  in the pattern. Any opening parenthesis that is not followed by
+       ? behaves as if it were followed by ?: but named parentheses can  still
+       be  used  for  capturing  (and  they acquire numbers in the usual way).
        There is no equivalent of this option in Perl.


          PCRE_UNGREEDY


-       This  option  inverts  the "greediness" of the quantifiers so that they
-       are not greedy by default, but become greedy if followed by "?". It  is
-       not  compatible  with Perl. It can also be set by a (?U) option setting
+       This option inverts the "greediness" of the quantifiers  so  that  they
+       are  not greedy by default, but become greedy if followed by "?". It is
+       not compatible with Perl. It can also be set by a (?U)  option  setting
        within the pattern.


          PCRE_UTF8


-       This option causes PCRE to regard both the pattern and the  subject  as
-       strings  of  UTF-8 characters instead of single-byte character strings.
-       However, it is available only when PCRE is built to include UTF-8  sup-
-       port.  If not, the use of this option provokes an error. Details of how
-       this option changes the behaviour of PCRE are given in the  section  on
+       This  option  causes PCRE to regard both the pattern and the subject as
+       strings of UTF-8 characters instead of single-byte  character  strings.
+       However,  it is available only when PCRE is built to include UTF-8 sup-
+       port. If not, the use of this option provokes an error. Details of  how
+       this  option  changes the behaviour of PCRE are given in the section on
        UTF-8 support in the main pcre page.


          PCRE_NO_UTF8_CHECK


        When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
-       automatically checked. There is a  discussion  about  the  validity  of
-       UTF-8  strings  in  the main pcre page. If an invalid UTF-8 sequence of
-       bytes is found, pcre_compile() returns an error. If  you  already  know
+       automatically  checked.  There  is  a  discussion about the validity of
+       UTF-8 strings in the main pcre page. If an invalid  UTF-8  sequence  of
+       bytes  is  found,  pcre_compile() returns an error. If you already know
        that your pattern is valid, and you want to skip this check for perfor-
-       mance reasons, you can set the PCRE_NO_UTF8_CHECK option.  When  it  is
-       set,  the  effect  of  passing  an invalid UTF-8 string as a pattern is
-       undefined. It may cause your program to crash. Note  that  this  option
-       can  also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the
+       mance  reasons,  you  can set the PCRE_NO_UTF8_CHECK option. When it is
+       set, the effect of passing an invalid UTF-8  string  as  a  pattern  is
+       undefined.  It  may  cause your program to crash. Note that this option
+       can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress  the
        UTF-8 validity checking of subject strings.



COMPILATION ERROR CODES

-       The following table lists the error  codes  than  may  be  returned  by
-       pcre_compile2(),  along with the error messages that may be returned by
-       both compiling functions. As PCRE has developed, some error codes  have
+       The  following  table  lists  the  error  codes than may be returned by
+       pcre_compile2(), along with the error messages that may be returned  by
+       both  compiling functions. As PCRE has developed, some error codes have
        fallen out of use. To avoid confusion, they have not been re-used.


           0  no error
@@ -1447,7 +1445,7 @@
          50  [this code is not in use]
          51  octal value is greater than \377 (not in UTF-8 mode)
          52  internal error: overran compiling workspace
-         53   internal  error:  previously-checked  referenced  subpattern not
+         53  internal  error:  previously-checked  referenced  subpattern  not
        found
          54  DEFINE group contains more than one branch
          55  repeating a DEFINE group is not allowed
@@ -1462,7 +1460,7 @@
          63  digit expected after (?+
          64  ] is an invalid data character in JavaScript compatibility mode


-       The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different
+       The  numbers  32  and 10000 in errors 48 and 49 are defaults; different
        values may be used if the limits were changed when PCRE was built.



@@ -1471,32 +1469,32 @@
        pcre_extra *pcre_study(const pcre *code, int options
             const char **errptr);


-       If  a  compiled  pattern is going to be used several times, it is worth
+       If a compiled pattern is going to be used several times,  it  is  worth
        spending more time analyzing it in order to speed up the time taken for
-       matching.  The function pcre_study() takes a pointer to a compiled pat-
+       matching. The function pcre_study() takes a pointer to a compiled  pat-
        tern as its first argument. If studying the pattern produces additional
-       information  that  will  help speed up matching, pcre_study() returns a
-       pointer to a pcre_extra block, in which the study_data field points  to
+       information that will help speed up matching,  pcre_study()  returns  a
+       pointer  to a pcre_extra block, in which the study_data field points to
        the results of the study.


        The  returned  value  from  pcre_study()  can  be  passed  directly  to
-       pcre_exec(). However, a pcre_extra block  also  contains  other  fields
-       that  can  be  set  by the caller before the block is passed; these are
+       pcre_exec().  However,  a  pcre_extra  block also contains other fields
+       that can be set by the caller before the block  is  passed;  these  are
        described below in the section on matching a pattern.


-       If studying the pattern does not  produce  any  additional  information
+       If  studying  the  pattern  does not produce any additional information
        pcre_study() returns NULL. In that circumstance, if the calling program
-       wants to pass any of the other fields to pcre_exec(), it  must  set  up
+       wants  to  pass  any of the other fields to pcre_exec(), it must set up
        its own pcre_extra block.


-       The  second  argument of pcre_study() contains option bits. At present,
+       The second argument of pcre_study() contains option bits.  At  present,
        no options are defined, and this argument should always be zero.


-       The third argument for pcre_study() is a pointer for an error  message.
-       If  studying  succeeds  (even  if no data is returned), the variable it
-       points to is set to NULL. Otherwise it is set to  point  to  a  textual
+       The  third argument for pcre_study() is a pointer for an error message.
+       If studying succeeds (even if no data is  returned),  the  variable  it
+       points  to  is  set  to NULL. Otherwise it is set to point to a textual
        error message. This is a static string that is part of the library. You
-       must not try to free it. You should test the  error  pointer  for  NULL
+       must  not  try  to  free it. You should test the error pointer for NULL
        after calling pcre_study(), to be sure that it has run successfully.


        This is a typical call to pcre_study():
@@ -1508,62 +1506,62 @@
            &error);        /* set to NULL or points to a message */


        At present, studying a pattern is useful only for non-anchored patterns
-       that do not have a single fixed starting character. A bitmap of  possi-
+       that  do not have a single fixed starting character. A bitmap of possi-
        ble starting bytes is created.



LOCALE SUPPORT

-       PCRE  handles  caseless matching, and determines whether characters are
-       letters, digits, or whatever, by reference to a set of tables,  indexed
-       by  character  value.  When running in UTF-8 mode, this applies only to
-       characters with codes less than 128. Higher-valued  codes  never  match
-       escapes  such  as  \w or \d, but can be tested with \p if PCRE is built
-       with Unicode character property support. The use of locales  with  Uni-
-       code  is discouraged. If you are handling characters with codes greater
-       than 128, you should either use UTF-8 and Unicode, or use locales,  but
+       PCRE handles caseless matching, and determines whether  characters  are
+       letters,  digits, or whatever, by reference to a set of tables, indexed
+       by character value. When running in UTF-8 mode, this  applies  only  to
+       characters  with  codes  less than 128. Higher-valued codes never match
+       escapes such as \w or \d, but can be tested with \p if  PCRE  is  built
+       with  Unicode  character property support. The use of locales with Uni-
+       code is discouraged. If you are handling characters with codes  greater
+       than  128, you should either use UTF-8 and Unicode, or use locales, but
        not try to mix the two.


-       PCRE  contains  an  internal set of tables that are used when the final
-       argument of pcre_compile() is  NULL.  These  are  sufficient  for  many
+       PCRE contains an internal set of tables that are used  when  the  final
+       argument  of  pcre_compile()  is  NULL.  These  are sufficient for many
        applications.  Normally, the internal tables recognize only ASCII char-
        acters. However, when PCRE is built, it is possible to cause the inter-
        nal tables to be rebuilt in the default "C" locale of the local system,
        which may cause them to be different.


-       The internal tables can always be overridden by tables supplied by  the
+       The  internal tables can always be overridden by tables supplied by the
        application that calls PCRE. These may be created in a different locale
-       from the default. As more and more applications change  to  using  Uni-
+       from  the  default.  As more and more applications change to using Uni-
        code, the need for this locale support is expected to die away.


-       External  tables  are  built by calling the pcre_maketables() function,
-       which has no arguments, in the relevant locale. The result can then  be
-       passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For
-       example, to build and use tables that are appropriate  for  the  French
-       locale  (where  accented  characters  with  values greater than 128 are
+       External tables are built by calling  the  pcre_maketables()  function,
+       which  has no arguments, in the relevant locale. The result can then be
+       passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
+       example,  to  build  and use tables that are appropriate for the French
+       locale (where accented characters with  values  greater  than  128  are
        treated as letters), the following code could be used:


          setlocale(LC_CTYPE, "fr_FR");
          tables = pcre_maketables();
          re = pcre_compile(..., tables);


-       The locale name "fr_FR" is used on Linux and other  Unix-like  systems;
+       The  locale  name "fr_FR" is used on Linux and other Unix-like systems;
        if you are using Windows, the name for the French locale is "french".


-       When  pcre_maketables()  runs,  the  tables are built in memory that is
-       obtained via pcre_malloc. It is the caller's responsibility  to  ensure
-       that  the memory containing the tables remains available for as long as
+       When pcre_maketables() runs, the tables are built  in  memory  that  is
+       obtained  via  pcre_malloc. It is the caller's responsibility to ensure
+       that the memory containing the tables remains available for as long  as
        it is needed.


        The pointer that is passed to pcre_compile() is saved with the compiled
-       pattern,  and the same tables are used via this pointer by pcre_study()
+       pattern, and the same tables are used via this pointer by  pcre_study()
        and normally also by pcre_exec(). Thus, by default, for any single pat-
        tern, compilation, studying and matching all happen in the same locale,
        but different patterns can be compiled in different locales.


-       It is possible to pass a table pointer or NULL (indicating the  use  of
-       the  internal  tables)  to  pcre_exec(). Although not intended for this
-       purpose, this facility could be used to match a pattern in a  different
+       It  is  possible to pass a table pointer or NULL (indicating the use of
+       the internal tables) to pcre_exec(). Although  not  intended  for  this
+       purpose,  this facility could be used to match a pattern in a different
        locale from the one in which it was compiled. Passing table pointers at
        run time is discussed below in the section on matching a pattern.


@@ -1573,15 +1571,15 @@
        int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
             int what, void *where);


-       The pcre_fullinfo() function returns information about a compiled  pat-
+       The  pcre_fullinfo() function returns information about a compiled pat-
        tern. It replaces the obsolete pcre_info() function, which is neverthe-
        less retained for backwards compability (and is documented below).


-       The first argument for pcre_fullinfo() is a  pointer  to  the  compiled
-       pattern.  The second argument is the result of pcre_study(), or NULL if
-       the pattern was not studied. The third argument specifies  which  piece
-       of  information  is required, and the fourth argument is a pointer to a
-       variable to receive the data. The yield of the  function  is  zero  for
+       The  first  argument  for  pcre_fullinfo() is a pointer to the compiled
+       pattern. The second argument is the result of pcre_study(), or NULL  if
+       the  pattern  was not studied. The third argument specifies which piece
+       of information is required, and the fourth argument is a pointer  to  a
+       variable  to  receive  the  data. The yield of the function is zero for
        success, or one of the following negative numbers:


          PCRE_ERROR_NULL       the argument code was NULL
@@ -1589,9 +1587,9 @@
          PCRE_ERROR_BADMAGIC   the "magic number" was not found
          PCRE_ERROR_BADOPTION  the value of what was invalid


-       The  "magic  number" is placed at the start of each compiled pattern as
-       an simple check against passing an arbitrary memory pointer. Here is  a
-       typical  call  of pcre_fullinfo(), to obtain the length of the compiled
+       The "magic number" is placed at the start of each compiled  pattern  as
+       an  simple check against passing an arbitrary memory pointer. Here is a
+       typical call of pcre_fullinfo(), to obtain the length of  the  compiled
        pattern:


          int rc;
@@ -1602,76 +1600,76 @@
            PCRE_INFO_SIZE,   /* what is required */
            &length);         /* where to put the data */


-       The possible values for the third argument are defined in  pcre.h,  and
+       The  possible  values for the third argument are defined in pcre.h, and
        are as follows:


          PCRE_INFO_BACKREFMAX


-       Return  the  number  of  the highest back reference in the pattern. The
-       fourth argument should point to an int variable. Zero  is  returned  if
+       Return the number of the highest back reference  in  the  pattern.  The
+       fourth  argument  should  point to an int variable. Zero is returned if
        there are no back references.


          PCRE_INFO_CAPTURECOUNT


-       Return  the  number of capturing subpatterns in the pattern. The fourth
+       Return the number of capturing subpatterns in the pattern.  The  fourth
        argument should point to an int variable.


          PCRE_INFO_DEFAULT_TABLES


-       Return a pointer to the internal default character tables within  PCRE.
-       The  fourth  argument should point to an unsigned char * variable. This
+       Return  a pointer to the internal default character tables within PCRE.
+       The fourth argument should point to an unsigned char *  variable.  This
        information call is provided for internal use by the pcre_study() func-
-       tion.  External  callers  can  cause PCRE to use its internal tables by
+       tion. External callers can cause PCRE to use  its  internal  tables  by
        passing a NULL table pointer.


          PCRE_INFO_FIRSTBYTE


-       Return information about the first byte of any matched  string,  for  a
-       non-anchored  pattern. The fourth argument should point to an int vari-
-       able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old  name
+       Return  information  about  the first byte of any matched string, for a
+       non-anchored pattern. The fourth argument should point to an int  vari-
+       able.  (This option used to be called PCRE_INFO_FIRSTCHAR; the old name
        is still recognized for backwards compatibility.)


-       If  there  is  a  fixed first byte, for example, from a pattern such as
+       If there is a fixed first byte, for example, from  a  pattern  such  as
        (cat|cow|coyote), its value is returned. Otherwise, if either


-       (a) the pattern was compiled with the PCRE_MULTILINE option, and  every
+       (a)  the pattern was compiled with the PCRE_MULTILINE option, and every
        branch starts with "^", or


        (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
        set (if it were set, the pattern would be anchored),


-       -1 is returned, indicating that the pattern matches only at  the  start
-       of  a  subject string or after any newline within the string. Otherwise
+       -1  is  returned, indicating that the pattern matches only at the start
+       of a subject string or after any newline within the  string.  Otherwise
        -2 is returned. For anchored patterns, -2 is returned.


          PCRE_INFO_FIRSTTABLE


-       If the pattern was studied, and this resulted in the construction of  a
+       If  the pattern was studied, and this resulted in the construction of a
        256-bit table indicating a fixed set of bytes for the first byte in any
-       matching string, a pointer to the table is returned. Otherwise NULL  is
-       returned.  The fourth argument should point to an unsigned char * vari-
+       matching  string, a pointer to the table is returned. Otherwise NULL is
+       returned. The fourth argument should point to an unsigned char *  vari-
        able.


          PCRE_INFO_HASCRORLF


-       Return 1 if the pattern contains any explicit  matches  for  CR  or  LF
-       characters,  otherwise  0.  The  fourth argument should point to an int
-       variable. An explicit match is either a literal CR or LF character,  or
+       Return  1  if  the  pattern  contains any explicit matches for CR or LF
+       characters, otherwise 0. The fourth argument should  point  to  an  int
+       variable.  An explicit match is either a literal CR or LF character, or
        \r or \n.


          PCRE_INFO_JCHANGED


-       Return  1  if  the (?J) or (?-J) option setting is used in the pattern,
-       otherwise 0. The fourth argument should point to an int variable.  (?J)
+       Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,
+       otherwise  0. The fourth argument should point to an int variable. (?J)
        and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.


          PCRE_INFO_LASTLITERAL


-       Return  the  value of the rightmost literal byte that must exist in any
-       matched string, other than at its  start,  if  such  a  byte  has  been
+       Return the value of the rightmost literal byte that must exist  in  any
+       matched  string,  other  than  at  its  start,  if such a byte has been
        recorded. The fourth argument should point to an int variable. If there
-       is no such byte, -1 is returned. For anchored patterns, a last  literal
-       byte  is  recorded only if it follows something of variable length. For
+       is  no such byte, -1 is returned. For anchored patterns, a last literal
+       byte is recorded only if it follows something of variable  length.  For
        example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
        /^a\dz\d/ the returned value is -1.


@@ -1679,34 +1677,34 @@
          PCRE_INFO_NAMEENTRYSIZE
          PCRE_INFO_NAMETABLE


-       PCRE  supports the use of named as well as numbered capturing parenthe-
-       ses. The names are just an additional way of identifying the  parenthe-
+       PCRE supports the use of named as well as numbered capturing  parenthe-
+       ses.  The names are just an additional way of identifying the parenthe-
        ses, which still acquire numbers. Several convenience functions such as
-       pcre_get_named_substring() are provided for  extracting  captured  sub-
-       strings  by  name. It is also possible to extract the data directly, by
-       first converting the name to a number in order to  access  the  correct
+       pcre_get_named_substring()  are  provided  for extracting captured sub-
+       strings by name. It is also possible to extract the data  directly,  by
+       first  converting  the  name to a number in order to access the correct
        pointers in the output vector (described with pcre_exec() below). To do
-       the conversion, you need  to  use  the  name-to-number  map,  which  is
+       the  conversion,  you  need  to  use  the  name-to-number map, which is
        described by these three values.


        The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
        gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
-       of  each  entry;  both  of  these  return  an int value. The entry size
-       depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
-       a  pointer  to  the  first  entry of the table (a pointer to char). The
+       of each entry; both of these  return  an  int  value.  The  entry  size
+       depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns
+       a pointer to the first entry of the table  (a  pointer  to  char).  The
        first two bytes of each entry are the number of the capturing parenthe-
-       sis,  most  significant byte first. The rest of the entry is the corre-
-       sponding name, zero terminated. The names are  in  alphabetical  order.
+       sis, most significant byte first. The rest of the entry is  the  corre-
+       sponding  name,  zero  terminated. The names are in alphabetical order.
        When PCRE_DUPNAMES is set, duplicate names are in order of their paren-
-       theses numbers. For example, consider  the  following  pattern  (assume
-       PCRE_EXTENDED  is  set,  so  white  space  -  including  newlines  - is
+       theses  numbers.  For  example,  consider the following pattern (assume
+       PCRE_EXTENDED is  set,  so  white  space  -  including  newlines  -  is
        ignored):


          (?<date> (?<year>(\d\d)?\d\d) -
          (?<month>\d\d) - (?<day>\d\d) )


-       There are four named subpatterns, so the table has  four  entries,  and
-       each  entry  in the table is eight bytes long. The table is as follows,
+       There  are  four  named subpatterns, so the table has four entries, and
+       each entry in the table is eight bytes long. The table is  as  follows,
        with non-printing bytes shows in hexadecimal, and undefined bytes shown
        as ??:


@@ -1715,16 +1713,17 @@
          00 04 m  o  n  t  h  00
          00 02 y  e  a  r  00 ??


-       When  writing  code  to  extract  data from named subpatterns using the
-       name-to-number map, remember that the length of the entries  is  likely
+       When writing code to extract data  from  named  subpatterns  using  the
+       name-to-number  map,  remember that the length of the entries is likely
        to be different for each compiled pattern.


          PCRE_INFO_OKPARTIAL


-       Return  1 if the pattern can be used for partial matching, otherwise 0.
-       The fourth argument should point to an int  variable.  The  pcrepartial
-       documentation  lists  the restrictions that apply to patterns when par-
-       tial matching is used.
+       Return 1 if the pattern can be used for partial matching, otherwise  0.
+       The fourth argument should point to an int variable. From release 8.00,
+       this always returns 1, because the restrictions that previously applied
+       to  partial  matching  have  been lifted. The pcrepartial documentation
+       gives details of partial matching.


          PCRE_INFO_OPTIONS


@@ -1929,7 +1928,7 @@
        The  unused  bits of the options argument for pcre_exec() must be zero.
        The only bits that may  be  set  are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,
        PCRE_NOTBOL,    PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_START_OPTIMIZE,
-       PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.
+       PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and PCRE_PARTIAL_HARD.


          PCRE_ANCHORED


@@ -2021,7 +2020,7 @@
        again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
        if  that  fails by advancing the starting offset (see below) and trying
        an ordinary match again. There is some code that demonstrates how to do
-       this in the pcredemo.c sample program.
+       this in the pcredemo sample program.


          PCRE_NO_START_OPTIMIZE


@@ -2056,128 +2055,132 @@
        value  of startoffset that does not point to the start of a UTF-8 char-
        acter, is undefined. Your program may crash.


-         PCRE_PARTIAL
+         PCRE_PARTIAL_HARD
+         PCRE_PARTIAL_SOFT


-       This option turns on the  partial  matching  feature.  If  the  subject
-       string  fails to match the pattern, but at some point during the match-
-       ing process the end of the subject was reached (that  is,  the  subject
-       partially  matches  the  pattern and the failure to match occurred only
-       because there were not enough subject characters), pcre_exec()  returns
-       PCRE_ERROR_PARTIAL  instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is
-       used, there are restrictions on what may appear in the  pattern.  These
-       are discussed in the pcrepartial documentation.
+       These options turn on the partial matching feature. For backwards  com-
+       patibility,  PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial
+       match occurs if the end of the subject string is reached  successfully,
+       but  there  are not enough subject characters to complete the match. If
+       this happens when PCRE_PARTIAL_HARD  is  set,  pcre_exec()  immediately
+       returns  PCRE_ERROR_PARTIAL.  Otherwise,  if  PCRE_PARTIAL_SOFT is set,
+       matching continues by testing any other alternatives. Only if they  all
+       fail  is  PCRE_ERROR_PARTIAL  returned (instead of PCRE_ERROR_NOMATCH).
+       The portion of the string that provided the partial match is set as the
+       first  matching  string.  There  is  a  more detailed discussion in the
+       pcrepartial documentation.


    The string to be matched by pcre_exec()


-       The  subject string is passed to pcre_exec() as a pointer in subject, a
+       The subject string is passed to pcre_exec() as a pointer in subject,  a
        length (in bytes) in length, and a starting byte offset in startoffset.
        In UTF-8 mode, the byte offset must point to the start of a UTF-8 char-
-       acter. Unlike the pattern string, the subject may contain  binary  zero
-       bytes.  When the starting offset is zero, the search for a match starts
-       at the beginning of the subject, and this is by  far  the  most  common
+       acter.  Unlike  the pattern string, the subject may contain binary zero
+       bytes. When the starting offset is zero, the search for a match  starts
+       at  the  beginning  of  the subject, and this is by far the most common
        case.


-       A  non-zero  starting offset is useful when searching for another match
-       in the same subject by calling pcre_exec() again after a previous  suc-
-       cess.   Setting  startoffset differs from just passing over a shortened
-       string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins
+       A non-zero starting offset is useful when searching for  another  match
+       in  the same subject by calling pcre_exec() again after a previous suc-
+       cess.  Setting startoffset differs from just passing over  a  shortened
+       string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
        with any kind of lookbehind. For example, consider the pattern


          \Biss\B


-       which  finds  occurrences  of "iss" in the middle of words. (\B matches
-       only if the current position in the subject is not  a  word  boundary.)
-       When  applied  to the string "Mississipi" the first call to pcre_exec()
-       finds the first occurrence. If pcre_exec() is called  again  with  just
-       the  remainder  of  the  subject,  namely  "issipi", it does not match,
+       which finds occurrences of "iss" in the middle of  words.  (\B  matches
+       only  if  the  current position in the subject is not a word boundary.)
+       When applied to the string "Mississipi" the first call  to  pcre_exec()
+       finds  the  first  occurrence. If pcre_exec() is called again with just
+       the remainder of the subject,  namely  "issipi",  it  does  not  match,
        because \B is always false at the start of the subject, which is deemed
-       to  be  a  word  boundary. However, if pcre_exec() is passed the entire
+       to be a word boundary. However, if pcre_exec()  is  passed  the  entire
        string again, but with startoffset set to 4, it finds the second occur-
-       rence  of "iss" because it is able to look behind the starting point to
+       rence of "iss" because it is able to look behind the starting point  to
        discover that it is preceded by a letter.


-       If a non-zero starting offset is passed when the pattern  is  anchored,
+       If  a  non-zero starting offset is passed when the pattern is anchored,
        one attempt to match at the given offset is made. This can only succeed
-       if the pattern does not require the match to be at  the  start  of  the
+       if  the  pattern  does  not require the match to be at the start of the
        subject.


    How pcre_exec() returns captured substrings


-       In  general, a pattern matches a certain portion of the subject, and in
-       addition, further substrings from the subject  may  be  picked  out  by
-       parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,
-       this is called "capturing" in what follows, and the  phrase  "capturing
-       subpattern"  is  used for a fragment of a pattern that picks out a sub-
-       string. PCRE supports several other kinds of  parenthesized  subpattern
+       In general, a pattern matches a certain portion of the subject, and  in
+       addition,  further  substrings  from  the  subject may be picked out by
+       parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
+       this  is  called "capturing" in what follows, and the phrase "capturing
+       subpattern" is used for a fragment of a pattern that picks out  a  sub-
+       string.  PCRE  supports several other kinds of parenthesized subpattern
        that do not cause substrings to be captured.


        Captured substrings are returned to the caller via a vector of integers
-       whose address is passed in ovector. The number of elements in the  vec-
-       tor  is  passed in ovecsize, which must be a non-negative number. Note:
+       whose  address is passed in ovector. The number of elements in the vec-
+       tor is passed in ovecsize, which must be a non-negative  number.  Note:
        this argument is NOT the size of ovector in bytes.


-       The first two-thirds of the vector is used to pass back  captured  sub-
-       strings,  each  substring using a pair of integers. The remaining third
-       of the vector is used as workspace by pcre_exec() while  matching  cap-
-       turing  subpatterns, and is not available for passing back information.
-       The number passed in ovecsize should always be a multiple of three.  If
+       The  first  two-thirds of the vector is used to pass back captured sub-
+       strings, each substring using a pair of integers. The  remaining  third
+       of  the  vector is used as workspace by pcre_exec() while matching cap-
+       turing subpatterns, and is not available for passing back  information.
+       The  number passed in ovecsize should always be a multiple of three. If
        it is not, it is rounded down.


-       When  a  match  is successful, information about captured substrings is
-       returned in pairs of integers, starting at the  beginning  of  ovector,
-       and  continuing  up  to two-thirds of its length at the most. The first
-       element of each pair is set to the byte offset of the  first  character
-       in  a  substring, and the second is set to the byte offset of the first
-       character after the end of a substring. Note: these values  are  always
+       When a match is successful, information about  captured  substrings  is
+       returned  in  pairs  of integers, starting at the beginning of ovector,
+       and continuing up to two-thirds of its length at the  most.  The  first
+       element  of  each pair is set to the byte offset of the first character
+       in a substring, and the second is set to the byte offset of  the  first
+       character  after  the end of a substring. Note: these values are always
        byte offsets, even in UTF-8 mode. They are not character counts.


-       The  first  pair  of  integers, ovector[0] and ovector[1], identify the
-       portion of the subject string matched by the entire pattern.  The  next
-       pair  is  used for the first capturing subpattern, and so on. The value
+       The first pair of integers, ovector[0]  and  ovector[1],  identify  the
+       portion  of  the subject string matched by the entire pattern. The next
+       pair is used for the first capturing subpattern, and so on.  The  value
        returned by pcre_exec() is one more than the highest numbered pair that
-       has  been  set.  For example, if two substrings have been captured, the
-       returned value is 3. If there are no capturing subpatterns, the  return
+       has been set.  For example, if two substrings have been  captured,  the
+       returned  value is 3. If there are no capturing subpatterns, the return
        value from a successful match is 1, indicating that just the first pair
        of offsets has been set.


        If a capturing subpattern is matched repeatedly, it is the last portion
        of the string that it matched that is returned.


-       If  the vector is too small to hold all the captured substring offsets,
+       If the vector is too small to hold all the captured substring  offsets,
        it is used as far as possible (up to two-thirds of its length), and the
-       function  returns  a value of zero. If the substring offsets are not of
-       interest, pcre_exec() may be called with ovector  passed  as  NULL  and
-       ovecsize  as zero. However, if the pattern contains back references and
-       the ovector is not big enough to remember the related substrings,  PCRE
-       has  to  get additional memory for use during matching. Thus it is usu-
+       function returns a value of zero. If the substring offsets are  not  of
+       interest,  pcre_exec()  may  be  called with ovector passed as NULL and
+       ovecsize as zero. However, if the pattern contains back references  and
+       the  ovector is not big enough to remember the related substrings, PCRE
+       has to get additional memory for use during matching. Thus it  is  usu-
        ally advisable to supply an ovector.


-       The pcre_info() function can be used to find  out  how  many  capturing
-       subpatterns  there  are  in  a  compiled pattern. The smallest size for
-       ovector that will allow for n captured substrings, in addition  to  the
+       The  pcre_info()  function  can  be used to find out how many capturing
+       subpatterns there are in a compiled  pattern.  The  smallest  size  for
+       ovector  that  will allow for n captured substrings, in addition to the
        offsets of the substring matched by the whole pattern, is (n+1)*3.


-       It  is  possible for capturing subpattern number n+1 to match some part
+       It is possible for capturing subpattern number n+1 to match  some  part
        of the subject when subpattern n has not been used at all. For example,
-       if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
+       if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the
        return from the function is 4, and subpatterns 1 and 3 are matched, but
-       2  is  not.  When  this happens, both values in the offset pairs corre-
+       2 is not. When this happens, both values in  the  offset  pairs  corre-
        sponding to unused subpatterns are set to -1.


-       Offset values that correspond to unused subpatterns at the end  of  the
-       expression  are  also  set  to  -1. For example, if the string "abc" is
-       matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
-       matched.  The  return  from the function is 2, because the highest used
+       Offset  values  that correspond to unused subpatterns at the end of the
+       expression are also set to -1. For example,  if  the  string  "abc"  is
+       matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not
+       matched. The return from the function is 2, because  the  highest  used
        capturing subpattern number is 1. However, you can refer to the offsets
-       for  the  second  and third capturing subpatterns if you wish (assuming
+       for the second and third capturing subpatterns if  you  wish  (assuming
        the vector is large enough, of course).


-       Some convenience functions are provided  for  extracting  the  captured
+       Some  convenience  functions  are  provided for extracting the captured
        substrings as separate strings. These are described below.


    Error return values from pcre_exec()


-       If  pcre_exec()  fails, it returns a negative number. The following are
+       If pcre_exec() fails, it returns a negative number. The  following  are
        defined in the header file:


          PCRE_ERROR_NOMATCH        (-1)
@@ -2186,7 +2189,7 @@


          PCRE_ERROR_NULL           (-2)


-       Either code or subject was passed as NULL,  or  ovector  was  NULL  and
+       Either  code  or  subject  was  passed as NULL, or ovector was NULL and
        ovecsize was not zero.


          PCRE_ERROR_BADOPTION      (-3)
@@ -2195,65 +2198,66 @@


          PCRE_ERROR_BADMAGIC       (-4)


-       PCRE  stores a 4-byte "magic number" at the start of the compiled code,
+       PCRE stores a 4-byte "magic number" at the start of the compiled  code,
        to catch the case when it is passed a junk pointer and to detect when a
        pattern that was compiled in an environment of one endianness is run in
-       an environment with the other endianness. This is the error  that  PCRE
+       an  environment  with the other endianness. This is the error that PCRE
        gives when the magic number is not present.


          PCRE_ERROR_UNKNOWN_OPCODE (-5)


        While running the pattern match, an unknown item was encountered in the
-       compiled pattern. This error could be caused by a bug  in  PCRE  or  by
+       compiled  pattern.  This  error  could be caused by a bug in PCRE or by
        overwriting of the compiled pattern.


          PCRE_ERROR_NOMEMORY       (-6)


-       If  a  pattern contains back references, but the ovector that is passed
+       If a pattern contains back references, but the ovector that  is  passed
        to pcre_exec() is not big enough to remember the referenced substrings,
-       PCRE  gets  a  block of memory at the start of matching to use for this
-       purpose. If the call via pcre_malloc() fails, this error is given.  The
+       PCRE gets a block of memory at the start of matching to  use  for  this
+       purpose.  If the call via pcre_malloc() fails, this error is given. The
        memory is automatically freed at the end of matching.


          PCRE_ERROR_NOSUBSTRING    (-7)


-       This  error is used by the pcre_copy_substring(), pcre_get_substring(),
+       This error is used by the pcre_copy_substring(),  pcre_get_substring(),
        and  pcre_get_substring_list()  functions  (see  below).  It  is  never
        returned by pcre_exec().


          PCRE_ERROR_MATCHLIMIT     (-8)


-       The  backtracking  limit,  as  specified  by the match_limit field in a
-       pcre_extra structure (or defaulted) was reached.  See  the  description
+       The backtracking limit, as specified by  the  match_limit  field  in  a
+       pcre_extra  structure  (or  defaulted) was reached. See the description
        above.


          PCRE_ERROR_CALLOUT        (-9)


        This error is never generated by pcre_exec() itself. It is provided for
-       use by callout functions that want to yield a distinctive  error  code.
+       use  by  callout functions that want to yield a distinctive error code.
        See the pcrecallout documentation for details.


          PCRE_ERROR_BADUTF8        (-10)


-       A  string  that contains an invalid UTF-8 byte sequence was passed as a
+       A string that contains an invalid UTF-8 byte sequence was passed  as  a
        subject.


          PCRE_ERROR_BADUTF8_OFFSET (-11)


        The UTF-8 byte sequence that was passed as a subject was valid, but the
-       value  of startoffset did not point to the beginning of a UTF-8 charac-
+       value of startoffset did not point to the beginning of a UTF-8  charac-
        ter.


          PCRE_ERROR_PARTIAL        (-12)


-       The subject string did not match, but it did match partially.  See  the
+       The  subject  string did not match, but it did match partially. See the
        pcrepartial documentation for details of partial matching.


          PCRE_ERROR_BADPARTIAL     (-13)


-       The  PCRE_PARTIAL  option  was  used with a compiled pattern containing
-       items that are not supported for partial matching. See the  pcrepartial
-       documentation for details of partial matching.
+       This code is no longer in  use.  It  was  formerly  returned  when  the
+       PCRE_PARTIAL  option  was used with a compiled pattern containing items
+       that were  not  supported  for  partial  matching.  From  release  8.00
+       onwards, there are no restrictions on partial matching.


          PCRE_ERROR_INTERNAL       (-14)


@@ -2517,19 +2521,24 @@
        The unused bits of the options argument  for  pcre_dfa_exec()  must  be
        zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
        LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,  PCRE_NO_UTF8_CHECK,
-       PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
-       three of these are the same as for pcre_exec(), so their description is
-       not repeated here.
+       PCRE_PARTIAL_HARD,     PCRE_PARTIAL_SOFT,     PCRE_DFA_SHORTEST,    and
+       PCRE_DFA_RESTART. All but the last four of these are exactly  the  same
+       as for pcre_exec(), so their description is not repeated here.


-         PCRE_PARTIAL
+         PCRE_PARTIAL_HARD
+         PCRE_PARTIAL_SOFT


-       This  has  the  same general effect as it does for pcre_exec(), but the
-       details  are  slightly  different.  When  PCRE_PARTIAL   is   set   for
-       pcre_dfa_exec(),  the  return code PCRE_ERROR_NOMATCH is converted into
-       PCRE_ERROR_PARTIAL if the end of the subject  is  reached,  there  have
-       been no complete matches, but there is still at least one matching pos-
-       sibility. The portion of the string that provided the partial match  is
-       set as the first matching string.
+       These  have the same general effect as they do for pcre_exec(), but the
+       details are slightly  different.  When  PCRE_PARTIAL_HARD  is  set  for
+       pcre_dfa_exec(),  it  returns PCRE_ERROR_PARTIAL if the end of the sub-
+       ject is reached and there is still at least  one  matching  possibility
+       that requires additional characters. This happens even if some complete
+       matches have also been found. When PCRE_PARTIAL_SOFT is set, the return
+       code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end
+       of the subject is reached, there have been  no  complete  matches,  but
+       there  is  still  at least one matching possibility. The portion of the
+       string that provided the longest partial match  is  set  as  the  first
+       matching string in both cases.


          PCRE_DFA_SHORTEST


@@ -2540,21 +2549,20 @@

          PCRE_DFA_RESTART


-       When pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option,  and
-       returns  a  partial  match, it is possible to call it again, with addi-
-       tional subject characters, and have it continue with  the  same  match.
-       The  PCRE_DFA_RESTART  option requests this action; when it is set, the
-       workspace and wscount options must reference the same vector as  before
-       because  data  about  the  match so far is left in them after a partial
-       match. There is more discussion of this  facility  in  the  pcrepartial
-       documentation.
+       When pcre_dfa_exec() returns a partial match, it is possible to call it
+       again,  with  additional  subject characters, and have it continue with
+       the same match. The PCRE_DFA_RESTART option requests this action;  when
+       it  is  set,  the workspace and wscount options must reference the same
+       vector as before because data about the match so far is  left  in  them
+       after a partial match. There is more discussion of this facility in the
+       pcrepartial documentation.


    Successful returns from pcre_dfa_exec()


-       When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
+       When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-
        string in the subject. Note, however, that all the matches from one run
-       of  the  function  start  at the same point in the subject. The shorter
-       matches are all initial substrings of the longer matches. For  example,
+       of the function start at the same point in  the  subject.  The  shorter
+       matches  are all initial substrings of the longer matches. For example,
        if the pattern


          <.*>
@@ -2569,61 +2577,61 @@
          <something> <something else>
          <something> <something else> <something further>


-       On  success,  the  yield of the function is a number greater than zero,
-       which is the number of matched substrings.  The  substrings  themselves
-       are  returned  in  ovector. Each string uses two elements; the first is
-       the offset to the start, and the second is the offset to  the  end.  In
-       fact,  all  the  strings  have the same start offset. (Space could have
-       been saved by giving this only once, but it was decided to retain  some
-       compatibility  with  the  way pcre_exec() returns data, even though the
+       On success, the yield of the function is a number  greater  than  zero,
+       which  is  the  number of matched substrings. The substrings themselves
+       are returned in ovector. Each string uses two elements;  the  first  is
+       the  offset  to  the start, and the second is the offset to the end. In
+       fact, all the strings have the same start  offset.  (Space  could  have
+       been  saved by giving this only once, but it was decided to retain some
+       compatibility with the way pcre_exec() returns data,  even  though  the
        meaning of the strings is different.)


        The strings are returned in reverse order of length; that is, the long-
-       est  matching  string is given first. If there were too many matches to
-       fit into ovector, the yield of the function is zero, and the vector  is
+       est matching string is given first. If there were too many  matches  to
+       fit  into ovector, the yield of the function is zero, and the vector is
        filled with the longest matches.


    Error returns from pcre_dfa_exec()


-       The  pcre_dfa_exec()  function returns a negative number when it fails.
-       Many of the errors are the same  as  for  pcre_exec(),  and  these  are
-       described  above.   There are in addition the following errors that are
+       The pcre_dfa_exec() function returns a negative number when  it  fails.
+       Many  of  the  errors  are  the  same as for pcre_exec(), and these are
+       described above.  There are in addition the following errors  that  are
        specific to pcre_dfa_exec():


          PCRE_ERROR_DFA_UITEM      (-16)


-       This return is given if pcre_dfa_exec() encounters an item in the  pat-
-       tern  that  it  does not support, for instance, the use of \C or a back
+       This  return is given if pcre_dfa_exec() encounters an item in the pat-
+       tern that it does not support, for instance, the use of \C  or  a  back
        reference.


          PCRE_ERROR_DFA_UCOND      (-17)


-       This return is given if pcre_dfa_exec()  encounters  a  condition  item
-       that  uses  a back reference for the condition, or a test for recursion
+       This  return  is  given  if pcre_dfa_exec() encounters a condition item
+       that uses a back reference for the condition, or a test  for  recursion
        in a specific group. These are not supported.


          PCRE_ERROR_DFA_UMLIMIT    (-18)


-       This return is given if pcre_dfa_exec() is called with an  extra  block
+       This  return  is given if pcre_dfa_exec() is called with an extra block
        that contains a setting of the match_limit field. This is not supported
        (it is meaningless).


          PCRE_ERROR_DFA_WSSIZE     (-19)


-       This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
+       This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
        workspace vector.


          PCRE_ERROR_DFA_RECURSE    (-20)


-       When  a  recursive subpattern is processed, the matching function calls
-       itself recursively, using private vectors for  ovector  and  workspace.
-       This  error  is  given  if  the output vector is not large enough. This
+       When a recursive subpattern is processed, the matching  function  calls
+       itself  recursively,  using  private vectors for ovector and workspace.
+       This error is given if the output vector  is  not  large  enough.  This
        should be extremely rare, as a vector of size 1000 is used.



SEE ALSO

-       pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3),  pcrepar-
+       pcrebuild(3),  pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar-
        tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).



@@ -2636,11 +2644,11 @@

REVISION

-       Last updated: 11 April 2009
+       Last updated: 01 September 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCRECALLOUT(3)                                                  PCRECALLOUT(3)



@@ -2815,8 +2823,8 @@
        Last updated: 15 March 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCRECOMPAT(3)                                                    PCRECOMPAT(3)



@@ -2829,7 +2837,7 @@
        This  document describes the differences in the ways that PCRE and Perl
        handle regular expressions. The differences described here  are  mainly
        with  respect  to  Perl 5.8, though PCRE versions 7.0 and later contain
-       some features that are expected to be in the forthcoming Perl 5.10.
+       some features that are in Perl 5.10.


        1. PCRE has only a subset of Perl's UTF-8 and Unicode support.  Details
        of  what  it does have are given in the section on UTF-8 support in the
@@ -2953,11 +2961,11 @@


REVISION

-       Last updated: 11 September 2007
-       Copyright (c) 1997-2007 University of Cambridge.
+       Last updated: 25 August 2009
+       Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCREPATTERN(3)                                                  PCREPATTERN(3)



@@ -5034,8 +5042,8 @@
        Last updated: 11 April 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCRESYNTAX(3)                                                    PCRESYNTAX(3)



@@ -5387,8 +5395,8 @@
        Last updated: 11 April 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCREPARTIAL(3)                                                  PCREPARTIAL(3)



@@ -5412,77 +5420,162 @@

        If the application sees the user's keystrokes one by one, and can check
        that what has been typed so far is potentially valid,  it  is  able  to
-       raise  an  error as soon as a mistake is made, possibly beeping and not
-       reflecting the character that has been typed. This  immediate  feedback
-       is  likely  to  be a better user interface than a check that is delayed
-       until the entire string has been entered.
+       raise  an  error  as  soon  as  a  mistake  is made, by beeping and not
+       reflecting the character that has been typed, for example. This immedi-
+       ate  feedback is likely to be a better user interface than a check that
+       is delayed until the entire string has been entered.  Partial  matching
+       can  also  sometimes be useful when the subject string is very long and
+       is not all available at once.


-       PCRE supports the concept of partial matching by means of the PCRE_PAR-
-       TIAL   option,   which   can   be   set  when  calling  pcre_exec()  or
-       pcre_dfa_exec(). When this flag is set for pcre_exec(), the return code
-       PCRE_ERROR_NOMATCH  is converted into PCRE_ERROR_PARTIAL if at any time
-       during the matching process the last part of the subject string matched
-       part  of  the  pattern. Unfortunately, for non-anchored matching, it is
-       not possible to obtain the position of the start of the partial  match.
-       No captured data is set when PCRE_ERROR_PARTIAL is returned.
+       PCRE supports partial matching by means of  the  PCRE_PARTIAL_SOFT  and
+       PCRE_PARTIAL_HARD options, which can be set when calling pcre_exec() or
+       pcre_dfa_exec(). For backwards compatibility, PCRE_PARTIAL is a synonym
+       for PCRE_PARTIAL_SOFT. The essential difference between the two options
+       is whether or not a partial match is preferred to an  alternative  com-
+       plete  match,  though the details differ between the two matching func-
+       tions. If both options are set, PCRE_PARTIAL_HARD takes precedence.


-       When   PCRE_PARTIAL   is  set  for  pcre_dfa_exec(),  the  return  code
-       PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the  end  of
-       the  subject is reached, there have been no complete matches, but there
-       is still at least one matching possibility. The portion of  the  string
-       that provided the partial match is set as the first matching string.
+       Setting a partial matching option disables one of PCRE's optimizations.
+       PCRE  remembers the last literal byte in a pattern, and abandons match-
+       ing immediately if such a byte is not present in  the  subject  string.
+       This  optimization cannot be used for a subject string that might match
+       only partially.


-       Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers
-       the last literal byte in a pattern, and abandons  matching  immediately
-       if  such a byte is not present in the subject string. This optimization
-       cannot be used for a subject string that might match only partially.


+PARTIAL MATCHING USING pcre_exec()

-RESTRICTED PATTERNS FOR PCRE_PARTIAL
+       A partial match occurs during a call to pcre_exec() whenever the end of
+       the  subject  string  is reached successfully, but matching cannot con-
+       tinue because more characters are needed. However, at least one charac-
+       ter  must have been matched. (In other words, a partial match can never
+       be an empty string.)


-       Because of the way certain internal optimizations  are  implemented  in
-       the  pcre_exec()  function, the PCRE_PARTIAL option cannot be used with
-       all patterns. These restrictions do not apply when  pcre_dfa_exec()  is
-       used.  For pcre_exec(), repeated single characters such as
+       If PCRE_PARTIAL_SOFT is set,  the  partial  match  is  remembered,  but
+       matching continues as normal, and other alternatives in the pattern are
+       tried.  If  no  complete  match  can  be  found,  pcre_exec()   returns
+       PCRE_ERROR_PARTIAL  instead  of PCRE_ERROR_NOMATCH, and if there are at
+       least two slots in the offsets vector, they are filled in with the off-
+       sets  of  the longest string that partially matched. Consider this pat-
+       tern:


-         a{2,4}
+         /123\w+X|dogY/


-       and repeated single metasequences such as
+       If this is matched against the subject string "abc123dog", both  alter-
+       natives  fail  to  match,  but the end of the subject is reached during
+       matching,   so    PCRE_ERROR_PARTIAL    is    returned    instead    of
+       PCRE_ERROR_NOMATCH.  The  offsets  are  set  to  3  and  9, identifying
+       "123dog" as the longest partial match that was found. (In this example,
+       there  are  two  partial  matches,  because  "dog" on its own partially
+       matches the second alternative.)


-         \d+
+       If PCRE_PARTIAL_HARD is set for pcre_exec(), it returns PCRE_ERROR_PAR-
+       TIAL  as soon as a partial match is found, without continuing to search
+       for possible complete matches. The difference between the  two  options
+       can be illustrated by a pattern such as:


-       are  not permitted if the maximum number of occurrences is greater than
-       one.  Optional items such as \d? (where the maximum is one) are permit-
-       ted.   Quantifiers  with any values are permitted after parentheses, so
-       the invalid examples above can be coded thus:
+         /dog(sbody)?/


-         (a){2,4}
-         (\d)+
+       This  matches either "dog" or "dogsbody", greedily (that is, it prefers
+       the longer string if possible). If it is  matched  against  the  string
+       "dog"  with  PCRE_PARTIAL_SOFT,  it  yields a complete match for "dog".
+       However, if PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL.
+       On  the  other hand, if the pattern is made ungreedy the result is dif-
+       ferent:


-       These constructions run more slowly, but for the kinds  of  application
-       that  are  envisaged  for this facility, this is not felt to be a major
-       restriction.
+         /dog(sbody)??/


-       If PCRE_PARTIAL is set for a pattern  that  does  not  conform  to  the
-       restrictions,  pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL
-       (-13).  You can use the PCRE_INFO_OKPARTIAL call to pcre_fullinfo()  to
-       find out if a compiled pattern can be used for partial matching.
+       In this case the result is always a complete match because  pcre_exec()
+       finds  that  first,  and  it  never continues after finding a match. It
+       might be easier to follow this explanation by thinking of the two  pat-
+       terns like this:


+         /dog(sbody)?/    is the same as  /dogsbody|dog/
+         /dog(sbody)??/   is the same as  /dog|dogsbody/


+       The  second  pattern  will  never  match "dogsbody" when pcre_exec() is
+       used, because it will always find the shorter match first.
+
+
+PARTIAL MATCHING USING pcre_dfa_exec()
+
+       The pcre_dfa_exec() function moves along the subject  string  character
+       by  character, without backtracking, searching for all possible matches
+       simultaneously. If the end of the subject is reached before the end  of
+       the  pattern,  there  is the possibility of a partial match, again pro-
+       vided that at least one character has matched.
+
+       When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned  only  if
+       there  have  been  no complete matches. Otherwise, the complete matches
+       are returned.  However, if PCRE_PARTIAL_HARD is set,  a  partial  match
+       takes  precedence  over any complete matches. The portion of the string
+       that provided the longest partial match is set as  the  first  matching
+       string, provided there are at least two slots in the offsets vector.
+
+       Because  pcre_dfa_exec()  always searches for all possible matches, and
+       there is no difference between greedy and ungreedy repetition, its  be-
+       haviour is different from pcre_exec when PCRE_PARTIAL_HARD is set. Con-
+       sider the string "dog"  matched  against  the  ungreedy  pattern  shown
+       above:
+
+         /dog(sbody)??/
+
+       Whereas  pcre_exec()  stops  as soon as it finds the complete match for
+       "dog", pcre_dfa_exec() also finds the partial match for "dogsbody", and
+       so returns that when PCRE_PARTIAL_HARD is set.
+
+
+PARTIAL MATCHING AND WORD BOUNDARIES
+
+       If  a  pattern ends with one of sequences \w or \W, which test for word
+       boundaries, partial matching with PCRE_PARTIAL_SOFT can  give  counter-
+       intuitive results. Consider this pattern:
+
+         /\bcat\b/
+
+       This matches "cat", provided there is a word boundary at either end. If
+       the subject string is "the cat", the comparison of the final "t" with a
+       following  character  cannot  take  place, so a partial match is found.
+       However, pcre_exec() carries on with normal matching, which matches  \b
+       at  the  end  of  the subject when the last character is a letter, thus
+       finding a complete match. The result, therefore, is not PCRE_ERROR_PAR-
+       TIAL.  The  same  thing  happens  with pcre_dfa_exec(), because it also
+       finds the complete match.
+
+       Using PCRE_PARTIAL_HARD in this  case  does  yield  PCRE_ERROR_PARTIAL,
+       because then the partial match takes precedence.
+
+
+FORMERLY RESTRICTED PATTERNS
+
+       For releases of PCRE prior to 8.00, because of the way certain internal
+       optimizations  were  implemented  in  the  pcre_exec()  function,   the
+       PCRE_PARTIAL  option  (predecessor  of  PCRE_PARTIAL_SOFT) could not be
+       used with all patterns. From release 8.00 onwards, the restrictions  no
+       longer  apply,  and  partial matching with pcre_exec() can be requested
+       for any pattern.
+
+       Items that were formerly restricted were repeated single characters and
+       repeated  metasequences. If PCRE_PARTIAL was set for a pattern that did
+       not conform to the restrictions, pcre_exec() returned  the  error  code
+       PCRE_ERROR_BADPARTIAL  (-13).  This error code is no longer in use. The
+       PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to find out if  a  compiled
+       pattern can be used for partial matching now always returns 1.
+
+
 EXAMPLE OF PARTIAL MATCHING USING PCRETEST


        If  the  escape  sequence  \P  is  present in a pcretest data line, the
-       PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that
-       uses the date example quoted above:
+       PCRE_PARTIAL_SOFT option is used for  the  match.  Here  is  a  run  of
+       pcretest that uses the date example quoted above:


            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
          data> 25jun04\P
           0: 25jun04
           1: jun
          data> 25dec3\P
-         Partial match
+         Partial match: 23dec3
          data> 3ju\P
-         Partial match
+         Partial match: 3ju
          data> 3juj\P
          No match
          data> j\P
@@ -5490,36 +5583,23 @@


        The  first  data  string  is  matched completely, so pcretest shows the
        matched substrings. The remaining four strings do not  match  the  com-
-       plete  pattern,  but  the first two are partial matches. The same test,
-       using pcre_dfa_exec() matching (by means of the  \D  escape  sequence),
-       produces the following output:
+       plete pattern, but the first two are partial matches. Similar output is
+       obtained when pcre_dfa_exec() is used.


-           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
-         data> 25jun04\P\D
-          0: 25jun04
-         data> 23dec3\P\D
-         Partial match: 23dec3
-         data> 3ju\P\D
-         Partial match: 3ju
-         data> 3juj\P\D
-         No match
-         data> j\P\D
-         No match
+       If the escape sequence \P is present more than once in a pcretest  data
+       line, the PCRE_PARTIAL_HARD option is set for the match.


-       Notice  that in this case the portion of the string that was matched is
-       made available.


-
MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()

        When a partial match has been found using pcre_dfa_exec(), it is possi-
-       ble  to  continue  the  match  by providing additional subject data and
-       calling pcre_dfa_exec() again with the same  compiled  regular  expres-
-       sion, this time setting the PCRE_DFA_RESTART option. You must also pass
-       the same working space as before, because this is where details of  the
-       previous  partial  match are stored. Here is an example using pcretest,
-       using the \R escape sequence to set the PCRE_DFA_RESTART option (\P and
-       \D are as above):
+       ble to continue the match by  providing  additional  subject  data  and
+       calling  pcre_dfa_exec()  again  with the same compiled regular expres-
+       sion, this time setting the PCRE_DFA_RESTART option. You must pass  the
+       same working space as before, because this is where details of the pre-
+       vious partial match are stored. Here  is  an  example  using  pcretest,
+       using  the  \R  escape  sequence to set the PCRE_DFA_RESTART option (\D
+       specifies the use of pcre_dfa_exec()):


            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
          data> 23ja\P\D
@@ -5527,38 +5607,71 @@
          data> n05\R\D
           0: n05


-       The  first  call has "23ja" as the subject, and requests partial match-
-       ing; the second call  has  "n05"  as  the  subject  for  the  continued
-       (restarted)  match.   Notice  that when the match is complete, only the
-       last part is shown; PCRE does  not  retain  the  previously  partially-
-       matched  string. It is up to the calling program to do that if it needs
+       The first call has "23ja" as the subject, and requests  partial  match-
+       ing;  the  second  call  has  "n05"  as  the  subject for the continued
+       (restarted) match.  Notice that when the match is  complete,  only  the
+       last  part  is  shown;  PCRE  does not retain the previously partially-
+       matched string. It is up to the calling program to do that if it  needs
        to.


-       You can set PCRE_PARTIAL  with  PCRE_DFA_RESTART  to  continue  partial
-       matching over multiple segments. This facility can be used to pass very
-       long subject strings to pcre_dfa_exec(). However, some care  is  needed
-       for certain types of pattern.
+       You  can  set  the  PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
+       PCRE_DFA_RESTART to continue partial matching over  multiple  segments.
+       This  facility  can  be  used  to  pass  very  long  subject strings to
+       pcre_dfa_exec().


-       1.  If  the  pattern contains tests for the beginning or end of a line,
-       you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options,  as  appropri-
-       ate,  when  the subject string for any call does not contain the begin-
+
+MULTI-SEGMENT MATCHING WITH pcre_exec()
+
+       From release 8.00, pcre_exec() can also be  used  to  do  multi-segment
+       matching.  Unlike  pcre_dfa_exec(),  it  is not possible to restart the
+       previous match with a new segment of data. Instead, new  data  must  be
+       added  to  the  previous  subject  string, and the entire match re-run,
+       starting from the point where the partial match occurred. Earlier  data
+       can be discarded.  Consider an unanchored pattern that matches dates:
+
+           re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
+         data> The date is 23ja\P
+         Partial match: 23ja
+
+       The this stage, an application could discard the text preceding "23ja",
+       add on text from the next segment, and call pcre_exec()  again.  Unlike
+       pcre_dfa_exec(),  the  entire matching string must always be available,
+       and the complete matching process occurs for each call, so more  memory
+       and more processing time is needed.
+
+
+ISSUES WITH MULTI-SEGMENT MATCHING
+
+       Certain types of pattern may give problems with multi-segment matching,
+       whichever matching function is used.
+
+       1. If the pattern contains tests for the beginning or end  of  a  line,
+       you  need  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropri-
+       ate, when the subject string for any call does not contain  the  begin-
        ning or end of a line.


-       2. If the pattern contains backward assertions (including  \b  or  \B),
-       you  need  to  arrange for some overlap in the subject strings to allow
-       for this. For example, you could pass the subject in  chunks  that  are
-       500  bytes long, but in a buffer of 700 bytes, with the starting offset
-       set to 200 and the previous 200 bytes at the start of the buffer.
+       2.  If  the  pattern contains backward assertions (including \b or \B),
+       you need to arrange for some overlap in the subject  strings  to  allow
+       for  them  to  be  correctly tested at the start of each substring. For
+       example, using pcre_dfa_exec(), you could pass the  subject  in  chunks
+       that  are 500 bytes long, but in a buffer of 700 bytes, with the start-
+       ing offset set to 200 and the previous 200 bytes at the  start  of  the
+       buffer.


-       3. Matching a subject string that is split into multiple segments  does
-       not  always produce exactly the same result as matching over one single
-       long string.  The difference arises when there  are  multiple  matching
-       possibilities,  because a partial match result is given only when there
-       are no completed matches in a call to pcre_dfa_exec(). This means  that
-       as  soon  as  the  shortest match has been found, continuation to a new
-       subject segment is no longer possible.  Consider this pcretest example:
+       3.  Matching  a subject string that is split into multiple segments may
+       not always produce exactly the same result as matching over one  single
+       long  string,  especially  when  PCRE_PARTIAL_SOFT is used. The section
+       "Partial Matching and Word Boundaries" above describes  an  issue  that
+       arises  if  the  pattern ends with \b or \B. Another kind of difference
+       may occur when there are multiple  matching  possibilities,  because  a
+       partial match result is given only when there are no completed matches.
+       This means that as soon as the shortest match has been found, continua-
+       tion  to  a  new subject segment is no longer possible.  Consider again
+       this pcretest example:


            re> /dog(sbody)?/
+         data> dogsb\P
+          0: dog
          data> do\P\D
          Partial match: do
          data> gsb\R\P\D
@@ -5567,18 +5680,31 @@
           0: dogsbody
           1: dog


-       The pattern matches the words "dog" or "dogsbody". When the subject  is
-       presented  in  several  parts  ("do" and "gsb" being the first two) the
-       match stops when "dog" has been found, and it is not possible  to  con-
-       tinue.  On  the  other  hand,  if  "dogsbody"  is presented as a single
-       string, both matches are found.
+       The first data line passes the string "dogsb" to  pcre_exec(),  setting
+       the  PCRE_PARTIAL_SOFT  option.  Although the string is a partial match
+       for "dogsbody", the  result  is  not  PCRE_ERROR_PARTIAL,  because  the
+       shorter  string  "dog" is a complete match. Similarly, when the subject
+       is presented to pcre_dfa_exec() in several parts ("do" and "gsb"  being
+       the first two) the match stops when "dog" has been found, and it is not
+       possible to continue. On the other hand, if "dogsbody" is presented  as
+       a single string, pcre_dfa_exec() finds both matches.


-       Because of this phenomenon, it does not usually make  sense  to  end  a
-       pattern that is going to be matched in this way with a variable repeat.
+       Because of these problems, it is probably best to use PCRE_PARTIAL_HARD
+       when matching multi-segment data. The example above then  behaves  dif-
+       ferently:


+           re> /dog(sbody)?/
+         data> dogsb\P\P
+         Partial match: dogsb
+         data> do\P\D
+         Partial match: do
+         data> gsb\R\P\P\D
+         Partial match: gsb
+
+
        4. Patterns that contain alternatives at the top level which do not all
-       start with the same pattern item may not work as expected. For example,
-       consider this pattern:
+       start with the  same  pattern  item  may  not  work  as  expected  when
+       pcre_dfa_exec() is used. For example, consider this pattern:


          1234|3789


@@ -5586,16 +5712,25 @@
        first alternative is found at offset 3. There is no partial  match  for
        the second alternative, because such a match does not start at the same
        point in the subject string. Attempting to  continue  with  the  string
-       "789" does not yield a match because only those alternatives that match
-       at one point in the subject are remembered. The problem arises  because
-       the  start  of the second alternative matches within the first alterna-
-       tive. There is no problem with anchored patterns or patterns such as:
+       "7890"  does  not  yield  a  match because only those alternatives that
+       match at one point in the subject are remembered.  The  problem  arises
+       because  the  start  of the second alternative matches within the first
+       alternative. There is no problem with  anchored  patterns  or  patterns
+       such as:


          1234|ABCD


-       where no string can be a partial match for both alternatives.
+       where  no  string can be a partial match for both alternatives. This is
+       not a problem if pcre_exec() is used, because the entire match  has  to
+       be rerun each time:


+           re> /1234|3789/
+         data> ABC123\P
+         Partial match: 123
+         data> 1237890
+          0: 3789


+
AUTHOR

        Philip Hazel
@@ -5605,11 +5740,11 @@


REVISION

-       Last updated: 04 June 2007
-       Copyright (c) 1997-2007 University of Cambridge.
+       Last updated: 31 August 2009
+       Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)



@@ -5732,8 +5867,8 @@
        Last updated: 13 June 2007
        Copyright (c) 1997-2007 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCREPERFORM(3)                                                  PCREPERFORM(3)



@@ -5882,8 +6017,8 @@
        Last updated: 06 March 2007
        Copyright (c) 1997-2007 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCREPOSIX(3)                                                      PCREPOSIX(3)



@@ -6001,7 +6136,11 @@
        is  public: re_nsub contains the number of capturing subpatterns in the
        regular expression. Various error codes are defined in the header file.


+       NOTE: If the yield of regcomp() is non-zero, you must  not  attempt  to
+       use the contents of the preg structure. If, for example, you pass it to
+       regexec(), the result is undefined and your program is likely to crash.


+
MATCHING NEWLINE CHARACTERS

        This area is not simple, because POSIX and Perl take different views of
@@ -6118,11 +6257,11 @@


REVISION

-       Last updated: 11 March 2009
+       Last updated: 15 August 2009
        Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCRECPP(3)                                                          PCRECPP(3)



@@ -6462,8 +6601,8 @@

        Last updated: 17 March 2009
 ------------------------------------------------------------------------------
-
-
+ 
+ 
 PCRESAMPLE(3)                                                    PCRESAMPLE(3)



@@ -6474,53 +6613,56 @@
PCRE SAMPLE PROGRAM

        A simple, complete demonstration program, to get you started with using
-       PCRE, is supplied in the file pcredemo.c in the PCRE distribution.
+       PCRE, is supplied in the file pcredemo.c in the  PCRE  distribution.  A
+       listing  of this program is given in the pcredemo documentation. If you
+       do not have a copy of the PCRE distribution, you can save this  listing
+       to re-create pcredemo.c.


        The program compiles the regular expression that is its first argument,
-       and  matches  it  against the subject string in its second argument. No
-       PCRE options are set, and default character tables are used. If  match-
-       ing  succeeds,  the  program  outputs  the  portion of the subject that
+       and matches it against the subject string in its  second  argument.  No
+       PCRE  options are set, and default character tables are used. If match-
+       ing succeeds, the program outputs  the  portion  of  the  subject  that
        matched, together with the contents of any captured substrings.


        If the -g option is given on the command line, the program then goes on
        to check for further matches of the same regular expression in the same
-       subject string. The logic is a little bit tricky because of the  possi-
-       bility  of  matching an empty string. Comments in the code explain what
+       subject  string. The logic is a little bit tricky because of the possi-
+       bility of matching an empty string. Comments in the code  explain  what
        is going on.


-       If PCRE is installed in the standard include  and  library  directories
-       for  your  system, you should be able to compile the demonstration pro-
+       If  PCRE  is  installed in the standard include and library directories
+       for your system, you should be able to compile the  demonstration  pro-
        gram using this command:


          gcc -o pcredemo pcredemo.c -lpcre


-       If PCRE is installed elsewhere, you may need to add additional  options
-       to  the  command line. For example, on a Unix-like system that has PCRE
-       installed in /usr/local, you  can  compile  the  demonstration  program
+       If  PCRE is installed elsewhere, you may need to add additional options
+       to the command line. For example, on a Unix-like system that  has  PCRE
+       installed  in  /usr/local,  you  can  compile the demonstration program
        using a command like this:


          gcc -o pcredemo -I/usr/local/include pcredemo.c \
              -L/usr/local/lib -lpcre


-       Once  you  have  compiled the demonstration program, you can run simple
+       Once you have compiled the demonstration program, you  can  run  simple
        tests like this:


          ./pcredemo 'cat|dog' 'the cat sat on the mat'
          ./pcredemo -g 'cat|dog' 'the dog sat on the cat'


-       Note that there is a  much  more  comprehensive  test  program,  called
-       pcretest,  which  supports  many  more  facilities  for testing regular
+       Note  that  there  is  a  much  more comprehensive test program, called
+       pcretest, which supports  many  more  facilities  for  testing  regular
        expressions and the PCRE library. The pcredemo program is provided as a
        simple coding example.


-       On some operating systems (e.g. Solaris), when PCRE is not installed in
-       the standard library directory, you may get an error like this when you
-       try to run pcredemo:
+       When you try to run pcredemo when PCRE is not installed in the standard
+       library  directory,  you  may  get an error like this on some operating
+       systems (e.g. Solaris):


-         ld.so.1:  a.out:  fatal:  libpcre.so.0:  open failed: No such file or
+         ld.so.1: a.out: fatal: libpcre.so.0: open failed:  No  such  file  or
        directory


-       This is caused by the way shared library support works  on  those  sys-
+       This  is  caused  by the way shared library support works on those sys-
        tems. You need to add


          -R/usr/local/lib
@@ -6537,8 +6679,8 @@


REVISION

-       Last updated: 23 January 2008
-       Copyright (c) 1997-2008 University of Cambridge.
+       Last updated: 01 September 2009
+       Copyright (c) 1997-2009 University of Cambridge.
 ------------------------------------------------------------------------------
 PCRESTACK(3)                                                      PCRESTACK(3)


@@ -6676,5 +6818,5 @@
        Last updated: 09 July 2008
        Copyright (c) 1997-2008 University of Cambridge.
 ------------------------------------------------------------------------------
-
-
+ 
+ 


Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/pcreapi.3    2009-09-01 16:10:16 UTC (rev 429)
@@ -135,9 +135,13 @@
 The functions \fBpcre_compile()\fP, \fBpcre_compile2()\fP, \fBpcre_study()\fP,
 and \fBpcre_exec()\fP are used for compiling and matching regular expressions
 in a Perl-compatible manner. A sample program that demonstrates the simplest
-way of using them is provided in the file called \fIpcredemo.c\fP in the source
-distribution. The
+way of using them is provided in the file called \fIpcredemo.c\fP in the PCRE
+source distribution. A listing of this program is given in the
 .\" HREF
+\fBpcredemo\fP
+.\"
+documentation, and the
+.\" HREF
 \fBpcresample\fP
 .\"
 documentation describes how to compile and run it.
@@ -1327,7 +1331,11 @@
 matching a null string by first trying the match again at the same offset with
 PCRE_NOTEMPTY and PCRE_ANCHORED, and then if that fails by advancing the
 starting offset (see below) and trying an ordinary match again. There is some
-code that demonstrates how to do this in the \fIpcredemo.c\fP sample program.
+code that demonstrates how to do this in the 
+.\" HREF
+\fBpcredemo\fP
+.\"
+sample program.
 .sp
   PCRE_NO_START_OPTIMIZE
 .sp
@@ -2003,6 +2011,6 @@
 .rs
 .sp
 .nf
-Last updated: 29 August 2009
+Last updated: 01 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Added: code/trunk/doc/pcredemo.3
===================================================================
--- code/trunk/doc/pcredemo.3                            (rev 0)
+++ code/trunk/doc/pcredemo.3    2009-09-01 16:10:16 UTC (rev 429)
@@ -0,0 +1,352 @@
+.\" Start example.
+.de EX
+.  nr mE \\n(.f
+.  nf
+.  nh
+.  ft CW
+..
+.
+.
+.\" End example.
+.de EE
+.  ft \\n(mE
+.  fi
+.  hy \\n(HY
+..
+.
+.EX
+/*************************************************
+*           PCRE DEMONSTRATION PROGRAM           *
+*************************************************/
+
+/* This is a demonstration program to illustrate the most straightforward ways
+of calling the PCRE regular expression library from a C program. See the
+pcresample documentation for a short discussion ("man pcresample" if you have
+the PCRE man pages installed).
+
+In Unix-like environments, compile this program thuswise:
+
+  gcc -Wall pcredemo.c -I/usr/local/include -L/usr/local/lib \e
+    -R/usr/local/lib -lpcre
+
+Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and
+library files for PCRE are installed on your system. You don't need -I and -L
+if PCRE is installed in the standard system libraries. Only some operating
+systems (e.g. Solaris) use the -R option.
+
+Building under Windows:
+
+If you want to statically link this program against a non-dll .a file, you must
+define PCRE_STATIC before including pcre.h, otherwise the pcre_malloc() and
+pcre_free() exported functions will be declared __declspec(dllimport), with
+unwanted results. So in this environment, uncomment the following line. */
+
+/* #define PCRE_STATIC */
+
+#include <stdio.h>
+#include <string.h>
+#include <pcre.h>
+
+#define OVECCOUNT 30    /* should be a multiple of 3 */
+
+
+int main(int argc, char **argv)
+{
+pcre *re;
+const char *error;
+char *pattern;
+char *subject;
+unsigned char *name_table;
+int erroffset;
+int find_all;
+int namecount;
+int name_entry_size;
+int ovector[OVECCOUNT];
+int subject_length;
+int rc, i;
+
+
+/**************************************************************************
+* First, sort out the command line. There is only one possible option at  *
+* the moment, "-g" to request repeated matching to find all occurrences,  *
+* like Perl's /g option. We set the variable find_all to a non-zero value *
+* if the -g option is present. Apart from that, there must be exactly two *
+* arguments.                                                              *
+**************************************************************************/
+
+find_all = 0;
+for (i = 1; i < argc; i++)
+  {
+  if (strcmp(argv[i], "-g") == 0) find_all = 1;
+    else break;
+  }
+
+/* After the options, we require exactly two arguments, which are the pattern,
+and the subject string. */
+
+if (argc - i != 2)
+  {
+  printf("Two arguments required: a regex and a subject string\en");
+  return 1;
+  }
+
+pattern = argv[i];
+subject = argv[i+1];
+subject_length = (int)strlen(subject);
+
+
+/*************************************************************************
+* Now we are going to compile the regular expression pattern, and handle *
+* and errors that are detected.                                          *
+*************************************************************************/
+
+re = pcre_compile(
+  pattern,              /* the pattern */
+  0,                    /* default options */
+  &error,               /* for error message */
+  &erroffset,           /* for error offset */
+  NULL);                /* use default character tables */
+
+/* Compilation failed: print the error message and exit */
+
+if (re == NULL)
+  {
+  printf("PCRE compilation failed at offset %d: %s\en", erroffset, error);
+  return 1;
+  }
+
+
+/*************************************************************************
+* If the compilation succeeded, we call PCRE again, in order to do a     *
+* pattern match against the subject string. This does just ONE match. If *
+* further matching is needed, it will be done below.                     *
+*************************************************************************/
+
+rc = pcre_exec(
+  re,                   /* the compiled pattern */
+  NULL,                 /* no extra data - we didn't study the pattern */
+  subject,              /* the subject string */
+  subject_length,       /* the length of the subject */
+  0,                    /* start at offset 0 in the subject */
+  0,                    /* default options */
+  ovector,              /* output vector for substring information */
+  OVECCOUNT);           /* number of elements in the output vector */
+
+/* Matching failed: handle error cases */
+
+if (rc < 0)
+  {
+  switch(rc)
+    {
+    case PCRE_ERROR_NOMATCH: printf("No match\en"); break;
+    /*
+    Handle other special cases if you like
+    */
+    default: printf("Matching error %d\en", rc); break;
+    }
+  pcre_free(re);     /* Release memory used for the compiled pattern */
+  return 1;
+  }
+
+/* Match succeded */
+
+printf("\enMatch succeeded at offset %d\en", ovector[0]);
+
+
+/*************************************************************************
+* We have found the first match within the subject string. If the output *
+* vector wasn't big enough, say so. Then output any substrings that were *
+* captured.                                                              *
+*************************************************************************/
+
+/* The output vector wasn't big enough */
+
+if (rc == 0)
+  {
+  rc = OVECCOUNT/3;
+  printf("ovector only has room for %d captured substrings\en", rc - 1);
+  }
+
+/* Show substrings stored in the output vector by number. Obviously, in a real
+application you might want to do things other than print them. */
+
+for (i = 0; i < rc; i++)
+  {
+  char *substring_start = subject + ovector[2*i];
+  int substring_length = ovector[2*i+1] - ovector[2*i];
+  printf("%2d: %.*s\en", i, substring_length, substring_start);
+  }
+
+
+/**************************************************************************
+* That concludes the basic part of this demonstration program. We have    *
+* compiled a pattern, and performed a single match. The code that follows *
+* shows first how to access named substrings, and then how to code for    *
+* repeated matches on the same subject.                                   *
+**************************************************************************/
+
+/* See if there are any named substrings, and if so, show them by name. First
+we have to extract the count of named parentheses from the pattern. */
+
+(void)pcre_fullinfo(
+  re,                   /* the compiled pattern */
+  NULL,                 /* no extra data - we didn't study the pattern */
+  PCRE_INFO_NAMECOUNT,  /* number of named substrings */
+  &namecount);          /* where to put the answer */
+
+if (namecount <= 0) printf("No named substrings\en"); else
+  {
+  unsigned char *tabptr;
+  printf("Named substrings\en");
+
+  /* Before we can access the substrings, we must extract the table for
+  translating names to numbers, and the size of each entry in the table. */
+
+  (void)pcre_fullinfo(
+    re,                       /* the compiled pattern */
+    NULL,                     /* no extra data - we didn't study the pattern */
+    PCRE_INFO_NAMETABLE,      /* address of the table */
+    &name_table);             /* where to put the answer */
+
+  (void)pcre_fullinfo(
+    re,                       /* the compiled pattern */
+    NULL,                     /* no extra data - we didn't study the pattern */
+    PCRE_INFO_NAMEENTRYSIZE,  /* size of each entry in the table */
+    &name_entry_size);        /* where to put the answer */
+
+  /* Now we can scan the table and, for each entry, print the number, the name,
+  and the substring itself. */
+
+  tabptr = name_table;
+  for (i = 0; i < namecount; i++)
+    {
+    int n = (tabptr[0] << 8) | tabptr[1];
+    printf("(%d) %*s: %.*s\en", n, name_entry_size - 3, tabptr + 2,
+      ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
+    tabptr += name_entry_size;
+    }
+  }
+
+
+/*************************************************************************
+* If the "-g" option was given on the command line, we want to continue  *
+* to search for additional matches in the subject string, in a similar   *
+* way to the /g option in Perl. This turns out to be trickier than you   *
+* might think because of the possibility of matching an empty string.    *
+* What happens is as follows:                                            *
+*                                                                        *
+* If the previous match was NOT for an empty string, we can just start   *
+* the next match at the end of the previous one.                         *
+*                                                                        *
+* If the previous match WAS for an empty string, we can't do that, as it *
+* would lead to an infinite loop. Instead, a special call of pcre_exec() *
+* is made with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set. The first  *
+* of these tells PCRE that an empty string is not a valid match; other   *
+* possibilities must be tried. The second flag restricts PCRE to one     *
+* match attempt at the initial string position. If this match succeeds,  *
+* an alternative to the empty string match has been found, and we can    *
+* proceed round the loop.                                                *
+*************************************************************************/
+
+if (!find_all)
+  {
+  pcre_free(re);   /* Release the memory used for the compiled pattern */
+  return 0;        /* Finish unless -g was given */
+  }
+
+/* Loop for second and subsequent matches */
+
+for (;;)
+  {
+  int options = 0;                 /* Normally no options */
+  int start_offset = ovector[1];   /* Start at end of previous match */
+
+  /* If the previous match was for an empty string, we are finished if we are
+  at the end of the subject. Otherwise, arrange to run another match at the
+  same point to see if a non-empty match can be found. */
+
+  if (ovector[0] == ovector[1])
+    {
+    if (ovector[0] == subject_length) break;
+    options = PCRE_NOTEMPTY | PCRE_ANCHORED;
+    }
+
+  /* Run the next matching operation */
+
+  rc = pcre_exec(
+    re,                   /* the compiled pattern */
+    NULL,                 /* no extra data - we didn't study the pattern */
+    subject,              /* the subject string */
+    subject_length,       /* the length of the subject */
+    start_offset,         /* starting offset in the subject */
+    options,              /* options */
+    ovector,              /* output vector for substring information */
+    OVECCOUNT);           /* number of elements in the output vector */
+
+  /* This time, a result of NOMATCH isn't an error. If the value in "options"
+  is zero, it just means we have found all possible matches, so the loop ends.
+  Otherwise, it means we have failed to find a non-empty-string match at a
+  point where there was a previous empty-string match. In this case, we do what
+  Perl does: advance the matching position by one, and continue. We do this by
+  setting the "end of previous match" offset, because that is picked up at the
+  top of the loop as the point at which to start again. */
+
+  if (rc == PCRE_ERROR_NOMATCH)
+    {
+    if (options == 0) break;
+    ovector[1] = start_offset + 1;
+    continue;    /* Go round the loop again */
+    }
+
+  /* Other matching errors are not recoverable. */
+
+  if (rc < 0)
+    {
+    printf("Matching error %d\en", rc);
+    pcre_free(re);    /* Release memory used for the compiled pattern */
+    return 1;
+    }
+
+  /* Match succeded */
+
+  printf("\enMatch succeeded again at offset %d\en", ovector[0]);
+
+  /* The match succeeded, but the output vector wasn't big enough. */
+
+  if (rc == 0)
+    {
+    rc = OVECCOUNT/3;
+    printf("ovector only has room for %d captured substrings\en", rc - 1);
+    }
+
+  /* As before, show substrings stored in the output vector by number, and then
+  also any named substrings. */
+
+  for (i = 0; i < rc; i++)
+    {
+    char *substring_start = subject + ovector[2*i];
+    int substring_length = ovector[2*i+1] - ovector[2*i];
+    printf("%2d: %.*s\en", i, substring_length, substring_start);
+    }
+
+  if (namecount <= 0) printf("No named substrings\en"); else
+    {
+    unsigned char *tabptr = name_table;
+    printf("Named substrings\en");
+    for (i = 0; i < namecount; i++)
+      {
+      int n = (tabptr[0] << 8) | tabptr[1];
+      printf("(%d) %*s: %.*s\en", n, name_entry_size - 3, tabptr + 2,
+        ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
+      tabptr += name_entry_size;
+      }
+    }
+  }      /* End of loop to find second and subsequent matches */
+
+printf("\en");
+pcre_free(re);       /* Release memory used for the compiled pattern */
+return 0;
+}
+
+/* End of pcredemo.c */
+.EE


Modified: code/trunk/doc/pcregrep.txt
===================================================================
--- code/trunk/doc/pcregrep.txt    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/pcregrep.txt    2009-09-01 16:10:16 UTC (rev 429)
@@ -92,172 +92,181 @@


OPTIONS

-       --        This terminate the list of options. It is useful if the  next
-                 item  on  the command line starts with a hyphen but is not an
-                 option. This allows for the processing of patterns and  file-
+       The order in which some of the options appear can  affect  the  output.
+       For  example,  both  the  -h and -l options affect the printing of file
+       names. Whichever comes later in the command line will be the  one  that
+       takes effect.
+
+       --        This  terminate the list of options. It is useful if the next
+                 item on the command line starts with a hyphen but is  not  an
+                 option.  This allows for the processing of patterns and file-
                  names that start with hyphens.


        -A number, --after-context=number
-                 Output  number  lines of context after each matching line. If
+                 Output number lines of context after each matching  line.  If
                  filenames and/or line numbers are being output, a hyphen sep-
-                 arator  is  used  instead of a colon for the context lines. A
-                 line containing "--" is output between each group  of  lines,
-                 unless  they  are  in  fact contiguous in the input file. The
-                 value of number is expected to be relatively small.  However,
+                 arator is used instead of a colon for the  context  lines.  A
+                 line  containing  "--" is output between each group of lines,
+                 unless they are in fact contiguous in  the  input  file.  The
+                 value  of number is expected to be relatively small. However,
                  pcregrep guarantees to have up to 8K of following text avail-
                  able for context output.


        -B number, --before-context=number
-                 Output number lines of context before each matching line.  If
+                 Output  number lines of context before each matching line. If
                  filenames and/or line numbers are being output, a hyphen sep-
-                 arator is used instead of a colon for the  context  lines.  A
-                 line  containing  "--" is output between each group of lines,
-                 unless they are in fact contiguous in  the  input  file.  The
-                 value  of number is expected to be relatively small. However,
+                 arator  is  used  instead of a colon for the context lines. A
+                 line containing "--" is output between each group  of  lines,
+                 unless  they  are  in  fact contiguous in the input file. The
+                 value of number is expected to be relatively small.  However,
                  pcregrep guarantees to have up to 8K of preceding text avail-
                  able for context output.


        -C number, --context=number
-                 Output  number  lines  of  context both before and after each
-                 matching line.  This is equivalent to setting both -A and  -B
+                 Output number lines of context both  before  and  after  each
+                 matching  line.  This is equivalent to setting both -A and -B
                  to the same value.


        -c, --count
-                 Do  not  output individual lines; instead just output a count
-                 of the number of lines that would otherwise have been output.
-                 If  several  files  are  given, a count is output for each of
-                 them. In this mode, the -A, -B, and -C options are ignored.
+                 Do not output individual lines from the files that are  being
+                 scanned; instead output the number of lines that would other-
+                 wise have been shown. If no lines are  selected,  the  number
+                 zero  is  output.  If  several files are are being scanned, a
+                 count is output for each of them. However,  if  the  --files-
+                 with-matches  option  is  also  used,  only those files whose
+                 counts are greater than zero are listed. When -c is used, the
+                 -A, -B, and -C options are ignored.


        --colour, --color
                  If this option is given without any data, it is equivalent to
-                 "--colour=auto".   If  data  is required, it must be given in
+                 "--colour=auto".  If data is required, it must  be  given  in
                  the same shell item, separated by an equals sign.


        --colour=value, --color=value
                  This option specifies under what circumstances the parts of a
                  line that matched a pattern should be coloured in the output.
-                 By default, the output is not coloured. The value  (which  is
-                 optional,  see above) may be "never", "always", or "auto". In
-                 the latter case, colouring happens only if the standard  out-
-                 put  is connected to a terminal. More resources are used when
-                 colouring is enabled, because pcregrep has to search for  all
-                 possible  matches in a line, not just one, in order to colour
+                 By  default,  the output is not coloured. The value (which is
+                 optional, see above) may be "never", "always", or "auto".  In
+                 the  latter case, colouring happens only if the standard out-
+                 put is connected to a terminal. More resources are used  when
+                 colouring  is enabled, because pcregrep has to search for all
+                 possible matches in a line, not just one, in order to  colour
                  them all.


                  The colour that is used can be specified by setting the envi-
                  ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
                  of this variable should be a string of two numbers, separated
-                 by  a  semicolon.  They  are copied directly into the control
-                 string for setting colour  on  a  terminal,  so  it  is  your
-                 responsibility  to ensure that they make sense. If neither of
-                 the environment variables is  set,  the  default  is  "1;31",
+                 by a semicolon. They are copied  directly  into  the  control
+                 string  for  setting  colour  on  a  terminal,  so it is your
+                 responsibility to ensure that they make sense. If neither  of
+                 the  environment  variables  is  set,  the default is "1;31",
                  which gives red.


        -D action, --devices=action
-                 If  an  input  path  is  not  a  regular file or a directory,
-                 "action" specifies how it is to be  processed.  Valid  values
+                 If an input path is  not  a  regular  file  or  a  directory,
+                 "action"  specifies  how  it is to be processed. Valid values
                  are "read" (the default) or "skip" (silently skip the path).


        -d action, --directories=action
                  If an input path is a directory, "action" specifies how it is
-                 to be processed.  Valid  values  are  "read"  (the  default),
-                 "recurse"  (equivalent to the -r option), or "skip" (silently
-                 skip the path). In the default case, directories are read  as
-                 if  they  were  ordinary files. In some operating systems the
-                 effect of reading a directory like this is an immediate  end-
+                 to  be  processed.   Valid  values  are "read" (the default),
+                 "recurse" (equivalent to the -r option), or "skip"  (silently
+                 skip  the path). In the default case, directories are read as
+                 if they were ordinary files. In some  operating  systems  the
+                 effect  of reading a directory like this is an immediate end-
                  of-file.


        -e pattern, --regex=pattern, --regexp=pattern
                  Specify a pattern to be matched. This option can be used mul-
                  tiple times in order to specify several patterns. It can also
-                 be  used  as a way of specifying a single pattern that starts
-                 with a hyphen. When -e is used, no argument pattern is  taken
-                 from  the  command  line;  all  arguments are treated as file
-                 names. There is an overall maximum of 100 patterns. They  are
-                 applied  to  each line in the order in which they are defined
+                 be used as a way of specifying a single pattern  that  starts
+                 with  a hyphen. When -e is used, no argument pattern is taken
+                 from the command line; all  arguments  are  treated  as  file
+                 names.  There is an overall maximum of 100 patterns. They are
+                 applied to each line in the order in which they  are  defined
                  until one matches (or fails to match if -v is used). If -f is
-                 used  with  -e,  the command line patterns are matched first,
-                 followed by the patterns from the file,  independent  of  the
-                 order  in which these options are specified. Note that multi-
+                 used with -e, the command line patterns  are  matched  first,
+                 followed  by  the  patterns from the file, independent of the
+                 order in which these options are specified. Note that  multi-
                  ple use of -e is not the same as a single pattern with alter-
                  natives. For example, X|Y finds the first character in a line
-                 that is X or Y, whereas if the two patterns are  given  sepa-
+                 that  is  X or Y, whereas if the two patterns are given sepa-
                  rately, pcregrep finds X if it is present, even if it follows
-                 Y in the line. It finds Y only if there is no X in the  line.
-                 This  really  matters  only  if  you are using -o to show the
+                 Y  in the line. It finds Y only if there is no X in the line.
+                 This really matters only if you are  using  -o  to  show  the
                  part(s) of the line that matched.


        --exclude=pattern
                  When pcregrep is searching the files in a directory as a con-
-                 sequence  of  the  -r  (recursive search) option, any regular
+                 sequence of the -r (recursive  search)  option,  any  regular
                  files whose names match the pattern are excluded. Subdirecto-
-                 ries  are  not  excluded  by  this  option; they are searched
-                 recursively, subject to the --exclude_dir  and  --include_dir
-                 options.  The  pattern  is  a PCRE regular expression, and is
+                 ries are not excluded  by  this  option;  they  are  searched
+                 recursively,  subject  to the --exclude_dir and --include_dir
+                 options. The pattern is a PCRE  regular  expression,  and  is
                  matched against the final component of the file name (not the
-                 entire  path).  If  a  file  name  matches both --include and
-                 --exclude, it is excluded.  There is no short form  for  this
+                 entire path). If a  file  name  matches  both  --include  and
+                 --exclude,  it  is excluded.  There is no short form for this
                  option.


        --exclude_dir=pattern
-                 When  pcregrep  is searching the contents of a directory as a
-                 consequence of the -r (recursive search) option,  any  subdi-
-                 rectories  whose  names match the pattern are excluded. (Note
-                 that the --exclude option does  not  affect  subdirectories.)
-                 The  pattern  is  a  PCRE  regular expression, and is matched
-                 against the final component  of  the  name  (not  the  entire
-                 path).  If a subdirectory name matches both --include_dir and
-                 --exclude_dir, it is excluded. There is  no  short  form  for
+                 When pcregrep is searching the contents of a directory  as  a
+                 consequence  of  the -r (recursive search) option, any subdi-
+                 rectories whose names match the pattern are  excluded.  (Note
+                 that  the  --exclude  option does not affect subdirectories.)
+                 The pattern is a PCRE  regular  expression,  and  is  matched
+                 against  the  final  component  of  the  name (not the entire
+                 path). If a subdirectory name matches both --include_dir  and
+                 --exclude_dir,  it  is  excluded.  There is no short form for
                  this option.


        -F, --fixed-strings
-                 Interpret  each pattern as a list of fixed strings, separated
-                 by newlines, instead of  as  a  regular  expression.  The  -w
-                 (match  as  a  word) and -x (match whole line) options can be
+                 Interpret each pattern as a list of fixed strings,  separated
+                 by  newlines,  instead  of  as  a  regular expression. The -w
+                 (match as a word) and -x (match whole line)  options  can  be
                  used with -F. They apply to each of the fixed strings. A line
                  is selected if any of the fixed strings are found in it (sub-
                  ject to -w or -x, if present).


        -f filename, --file=filename
-                 Read a number of patterns from the file, one  per  line,  and
-                 match  them against each line of input. A data line is output
+                 Read  a  number  of patterns from the file, one per line, and
+                 match them against each line of input. A data line is  output
                  if any of the patterns match it. The filename can be given as
                  "-" to refer to the standard input. When -f is used, patterns
-                 specified on the command line using -e may also  be  present;
+                 specified  on  the command line using -e may also be present;
                  they are tested before the file's patterns. However, no other
-                 pattern is taken from the command  line;  all  arguments  are
-                 treated  as  file  names.  There is an overall maximum of 100
+                 pattern  is  taken  from  the command line; all arguments are
+                 treated as file names. There is an  overall  maximum  of  100
                  patterns. Trailing white space is removed from each line, and
-                 blank  lines  are ignored. An empty file contains no patterns
-                 and therefore matches nothing. See also  the  comments  about
-                 multiple  patterns  versus a single pattern with alternatives
+                 blank lines are ignored. An empty file contains  no  patterns
+                 and  therefore  matches  nothing. See also the comments about
+                 multiple patterns versus a single pattern  with  alternatives
                  in the description of -e above.


        --file-offsets
-                 Instead of showing lines or parts of lines that  match,  show
-                 each  match  as  an  offset  from the start of the file and a
-                 length, separated by a comma. In this  mode,  no  context  is
-                 shown.  That  is,  the -A, -B, and -C options are ignored. If
+                 Instead  of  showing lines or parts of lines that match, show
+                 each match as an offset from the start  of  the  file  and  a
+                 length,  separated  by  a  comma. In this mode, no context is
+                 shown. That is, the -A, -B, and -C options  are  ignored.  If
                  there is more than one match in a line, each of them is shown
-                 separately.  This  option  is mutually exclusive with --line-
+                 separately. This option is mutually  exclusive  with  --line-
                  offsets and --only-matching.


        -H, --with-filename
-                 Force the inclusion of the filename at the  start  of  output
-                 lines  when searching a single file. By default, the filename
-                 is not shown in this case. For matching lines,  the  filename
+                 Force  the  inclusion  of the filename at the start of output
+                 lines when searching a single file. By default, the  filename
+                 is  not  shown in this case. For matching lines, the filename
                  is followed by a colon; for context lines, a hyphen separator
-                 is used. If a line number is also being  output,  it  follows
+                 is  used.  If  a line number is also being output, it follows
                  the file name.


        -h, --no-filename
-                 Suppress  the output filenames when searching multiple files.
-                 By default, filenames  are  shown  when  multiple  files  are
-                 searched.  For  matching lines, the filename is followed by a
-                 colon; for context lines, a hyphen separator is used.   If  a
+                 Suppress the output filenames when searching multiple  files.
+                 By  default,  filenames  are  shown  when  multiple files are
+                 searched. For matching lines, the filename is followed  by  a
+                 colon;  for  context lines, a hyphen separator is used.  If a
                  line number is also being output, it follows the file name.


-       --help    Output  a  help  message, giving brief details of the command
+       --help    Output a help message, giving brief details  of  the  command
                  options and file type support, and then exit.


        -i, --ignore-case
@@ -267,36 +276,40 @@
                  When pcregrep is searching the files in a directory as a con-
                  sequence of the -r (recursive search) option, only those reg-
                  ular files whose names match the pattern are included. Subdi-
-                 rectories  are always included and searched recursively, sub-
+                 rectories are always included and searched recursively,  sub-
                  ject to the --include_dir and --exclude_dir options. The pat-
                  tern is a PCRE regular expression, and is matched against the
-                 final component of the file name (not the entire path). If  a
+                 final  component of the file name (not the entire path). If a
                  file  name  matches  both  --include  and  --exclude,  it  is
                  excluded. There is no short form for this option.


        --include_dir=pattern
-                 When pcregrep is searching the contents of a directory  as  a
-                 consequence  of  the -r (recursive search) option, only those
-                 subdirectories whose names match the  pattern  are  included.
-                 (Note  that  the --include option does not affect subdirecto-
-                 ries.) The pattern is  a  PCRE  regular  expression,  and  is
-                 matched  against  the  final  component  of the name (not the
-                 entire  path).  If   a   subdirectory   name   matches   both
-                 --include_dir  and --exclude_dir, it is excluded. There is no
+                 When  pcregrep  is searching the contents of a directory as a
+                 consequence of the -r (recursive search) option,  only  those
+                 subdirectories  whose  names  match the pattern are included.
+                 (Note that the --include option does not  affect  subdirecto-
+                 ries.)  The  pattern  is  a  PCRE  regular expression, and is
+                 matched against the final component  of  the  name  (not  the
+                 entire   path).   If   a   subdirectory   name  matches  both
+                 --include_dir and --exclude_dir, it is excluded. There is  no
                  short form for this option.


        -L, --files-without-match
-                 Instead of outputting lines from the files, just  output  the
-                 names  of  the files that do not contain any lines that would
-                 have been output. Each file name is output once, on  a  sepa-
+                 Instead  of  outputting lines from the files, just output the
+                 names of the files that do not contain any lines  that  would
+                 have  been  output. Each file name is output once, on a sepa-
                  rate line.


        -l, --files-with-matches
-                 Instead  of  outputting lines from the files, just output the
+                 Instead of outputting lines from the files, just  output  the
                  names of the files containing lines that would have been out-
-                 put.  Each  file  name  is  output  once, on a separate line.
-                 Searching stops as soon as a matching  line  is  found  in  a
-                 file.
+                 put. Each file name is  output  once,  on  a  separate  line.
+                 Searching  normally stops as soon as a matching line is found
+                 in a file. However, if the -c (count) option  is  also  used,
+                 matching  continues in order to obtain the correct count, and
+                 those files that have at least one  match  are  listed  along
+                 with their counts. Using this option with -c is a way of sup-
+                 pressing the listing of files with no matches.


        --label=name
                  This option supplies a name to be used for the standard input
@@ -304,106 +317,106 @@
                  input)" is used. There is no short form for this option.


        --line-offsets
-                 Instead  of  showing lines or parts of lines that match, show
+                 Instead of showing lines or parts of lines that  match,  show
                  each match as a line number, the offset from the start of the
-                 line,  and a length. The line number is terminated by a colon
-                 (as usual; see the -n option), and the offset and length  are
-                 separated  by  a  comma.  In  this mode, no context is shown.
-                 That is, the -A, -B, and -C options are ignored. If there  is
-                 more  than  one  match in a line, each of them is shown sepa-
+                 line, and a length. The line number is terminated by a  colon
+                 (as  usual; see the -n option), and the offset and length are
+                 separated by a comma. In this  mode,  no  context  is  shown.
+                 That  is, the -A, -B, and -C options are ignored. If there is
+                 more than one match in a line, each of them  is  shown  sepa-
                  rately. This option is mutually exclusive with --file-offsets
                  and --only-matching.


        --locale=locale-name
-                 This  option specifies a locale to be used for pattern match-
-                 ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi-
-                 ronment  variables.  If  no  locale  is  specified,  the PCRE
-                 library's default (usually the "C" locale) is used. There  is
+                 This option specifies a locale to be used for pattern  match-
+                 ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi-
+                 ronment variables.  If  no  locale  is  specified,  the  PCRE
+                 library's  default (usually the "C" locale) is used. There is
                  no short form for this option.


        -M, --multiline
-                 Allow  patterns to match more than one line. When this option
+                 Allow patterns to match more than one line. When this  option
                  is given, patterns may usefully contain literal newline char-
-                 acters  and  internal  occurrences of ^ and $ characters. The
-                 output for any one match may consist of more than  one  line.
-                 When  this option is set, the PCRE library is called in "mul-
-                 tiline" mode.  There is a limit to the number of  lines  that
-                 can  be matched, imposed by the way that pcregrep buffers the
-                 input file as it scans it. However, pcregrep ensures that  at
+                 acters and internal occurrences of ^ and  $  characters.  The
+                 output  for  any one match may consist of more than one line.
+                 When this option is set, the PCRE library is called in  "mul-
+                 tiline"  mode.   There is a limit to the number of lines that
+                 can be matched, imposed by the way that pcregrep buffers  the
+                 input  file as it scans it. However, pcregrep ensures that at
                  least 8K characters or the rest of the document (whichever is
-                 the shorter) are available for forward  matching,  and  simi-
+                 the  shorter)  are  available for forward matching, and simi-
                  larly the previous 8K characters (or all the previous charac-
-                 ters, if fewer than 8K) are guaranteed to  be  available  for
+                 ters,  if  fewer  than 8K) are guaranteed to be available for
                  lookbehind assertions.


        -N newline-type, --newline=newline-type
-                 The  PCRE  library  supports  five  different conventions for
-                 indicating the ends of lines. They are  the  single-character
-                 sequences  CR  (carriage  return) and LF (linefeed), the two-
-                 character sequence CRLF, an "anycrlf" convention, which  rec-
-                 ognizes  any  of the preceding three types, and an "any" con-
+                 The PCRE library  supports  five  different  conventions  for
+                 indicating  the  ends of lines. They are the single-character
+                 sequences CR (carriage return) and LF  (linefeed),  the  two-
+                 character  sequence CRLF, an "anycrlf" convention, which rec-
+                 ognizes any of the preceding three types, and an  "any"  con-
                  vention, in which any Unicode line ending sequence is assumed
-                 to  end a line. The Unicode sequences are the three just men-
-                 tioned,  plus  VT  (vertical  tab,  U+000B),  FF   (formfeed,
-                 U+000C),   NEL  (next  line,  U+0085),  LS  (line  separator,
+                 to end a line. The Unicode sequences are the three just  men-
+                 tioned,   plus  VT  (vertical  tab,  U+000B),  FF  (formfeed,
+                 U+000C),  NEL  (next  line,  U+0085),  LS  (line   separator,
                  U+2028), and PS (paragraph separator, U+2029).


                  When  the  PCRE  library  is  built,  a  default  line-ending
-                 sequence   is  specified.   This  is  normally  the  standard
+                 sequence  is  specified.   This  is  normally  the   standard
                  sequence for the operating system. Unless otherwise specified
-                 by  this  option,  pcregrep  uses the library's default.  The
+                 by this option, pcregrep uses  the  library's  default.   The
                  possible values for this option are CR, LF, CRLF, ANYCRLF, or
-                 ANY.  This  makes  it  possible to use pcregrep on files that
-                 have come from other environments without  having  to  modify
-                 their  line  endings.  If the data that is being scanned does
-                 not agree with the convention set by  this  option,  pcregrep
+                 ANY. This makes it possible to use  pcregrep  on  files  that
+                 have  come  from  other environments without having to modify
+                 their line endings. If the data that is  being  scanned  does
+                 not  agree  with  the convention set by this option, pcregrep
                  may behave in strange ways.


        -n, --line-number
                  Precede each output line by its line number in the file, fol-
-                 lowed by a colon for matching lines or a hyphen  for  context
-                 lines.  If the filename is also being output, it precedes the
+                 lowed  by  a colon for matching lines or a hyphen for context
+                 lines. If the filename is also being output, it precedes  the
                  line number. This option is forced if --line-offsets is used.


        -o, --only-matching
-                 Show only the part of the line that  matched  a  pattern.  In
-                 this  mode,  no context is shown. That is, the -A, -B, and -C
-                 options are ignored. If there is more than  one  match  in  a
-                 line,  each  of  them  is shown separately. If -o is combined
-                 with -v (invert the sense of the match to  find  non-matching
-                 lines),  no  output  is generated, but the return code is set
+                 Show  only  the  part  of the line that matched a pattern. In
+                 this mode, no context is shown. That is, the -A, -B,  and  -C
+                 options  are  ignored.  If  there is more than one match in a
+                 line, each of them is shown separately.  If  -o  is  combined
+                 with  -v  (invert the sense of the match to find non-matching
+                 lines), no output is generated, but the return  code  is  set
                  appropriately. This option is mutually exclusive with --file-
                  offsets and --line-offsets.


        -q, --quiet
                  Work quietly, that is, display nothing except error messages.
-                 The exit status indicates whether or  not  any  matches  were
+                 The  exit  status  indicates  whether or not any matches were
                  found.


        -r, --recursive
-                 If  any given path is a directory, recursively scan the files
-                 it contains, taking note of any --include and --exclude  set-
-                 tings.  By  default, a directory is read as a normal file; in
-                 some operating systems this gives an  immediate  end-of-file.
-                 This  option  is  a  shorthand  for  setting the -d option to
+                 If any given path is a directory, recursively scan the  files
+                 it  contains, taking note of any --include and --exclude set-
+                 tings. By default, a directory is read as a normal  file;  in
+                 some  operating  systems this gives an immediate end-of-file.
+                 This option is a shorthand  for  setting  the  -d  option  to
                  "recurse".


        -s, --no-messages
-                 Suppress error  messages  about  non-existent  or  unreadable
-                 files.  Such  files  are quietly skipped. However, the return
+                 Suppress  error  messages  about  non-existent  or unreadable
+                 files. Such files are quietly skipped.  However,  the  return
                  code is still 2, even if matches were found in other files.


        -u, --utf-8
-                 Operate in UTF-8 mode. This option is available only if  PCRE
-                 has  been compiled with UTF-8 support. Both patterns and sub-
+                 Operate  in UTF-8 mode. This option is available only if PCRE
+                 has been compiled with UTF-8 support. Both patterns and  sub-
                  ject lines must be valid strings of UTF-8 characters.


        -V, --version
-                 Write the version numbers of pcregrep and  the  PCRE  library
+                 Write  the  version  numbers of pcregrep and the PCRE library
                  that is being used to the standard error stream.


        -v, --invert-match
-                 Invert  the  sense  of  the match, so that lines which do not
+                 Invert the sense of the match, so that  lines  which  do  not
                  match any of the patterns are the ones that are found.


        -w, --word-regex, --word-regexp
@@ -411,39 +424,40 @@
                  lent to having \b at the start and end of the pattern.


        -x, --line-regex, --line-regexp
-                 Force  the  patterns to be anchored (each must start matching
-                 at the beginning of a line) and in addition, require them  to
-                 match  entire  lines.  This  is  equivalent to having ^ and $
+                 Force the patterns to be anchored (each must  start  matching
+                 at  the beginning of a line) and in addition, require them to
+                 match entire lines. This is equivalent  to  having  ^  and  $
                  characters at the start and end of each alternative branch in
                  every pattern.



ENVIRONMENT VARIABLES

-       The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
-       order, for a locale. The first one that is set is  used.  This  can  be
-       overridden  by  the  --locale  option.  If  no  locale is set, the PCRE
+       The environment variables LC_ALL and LC_CTYPE  are  examined,  in  that
+       order,  for  a  locale.  The first one that is set is used. This can be
+       overridden by the --locale option.  If  no  locale  is  set,  the  PCRE
        library's default (usually the "C" locale) is used.



NEWLINES

-       The -N (--newline) option allows pcregrep to scan files with  different
-       newline  conventions  from  the  default.  However, the setting of this
-       option does not affect the way in which pcregrep writes information  to
-       the  standard  error  and  output streams. It uses the string "\n" in C
-       printf() calls to indicate newlines, relying on the C  I/O  library  to
-       convert  this  to  an  appropriate  sequence if the output is sent to a
+       The  -N (--newline) option allows pcregrep to scan files with different
+       newline conventions from the default.  However,  the  setting  of  this
+       option  does not affect the way in which pcregrep writes information to
+       the standard error and output streams. It uses the  string  "\n"  in  C
+       printf()  calls  to  indicate newlines, relying on the C I/O library to
+       convert this to an appropriate sequence if the  output  is  sent  to  a
        file.



OPTIONS COMPATIBILITY

        The majority of short and long forms of pcregrep's options are the same
-       as  in  the  GNU grep program. Any long option of the form --xxx-regexp
-       (GNU terminology) is also available as --xxx-regex (PCRE  terminology).
-       However,  the  --locale,  -M,  --multiline, -u, and --utf-8 options are
-       specific to pcregrep.
+       as in the GNU grep program. Any long option of  the  form  --xxx-regexp
+       (GNU  terminology) is also available as --xxx-regex (PCRE terminology).
+       However, the --locale, -M, --multiline, -u,  and  --utf-8  options  are
+       specific to pcregrep. If both the -c and -l options are given, GNU grep
+       lists only file names, without counts, but pcregrep gives the counts.



OPTIONS WITH DATA
@@ -508,5 +522,5 @@

REVISION

-       Last updated: 01 March 2009
+       Last updated: 12 August 2009
        Copyright (c) 1997-2009 University of Cambridge.


Modified: code/trunk/doc/pcresample.3
===================================================================
--- code/trunk/doc/pcresample.3    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/pcresample.3    2009-09-01 16:10:16 UTC (rev 429)
@@ -5,7 +5,13 @@
 .rs
 .sp
 A simple, complete demonstration program, to get you started with using PCRE,
-is supplied in the file \fIpcredemo.c\fP in the PCRE distribution.
+is supplied in the file \fIpcredemo.c\fP in the PCRE distribution. A listing of
+this program is given in the
+.\" HREF
+\fBpcredemo\fP
+.\"
+documentation. If you do not have a copy of the PCRE distribution, you can save 
+this listing to re-create \fIpcredemo.c\fP.
 .P
 The program compiles the regular expression that is its first argument, and
 matches it against the subject string in its second argument. No PCRE options
@@ -44,12 +50,18 @@
 \fBpcretest\fP,
 .\"
 which supports many more facilities for testing regular expressions and the
-PCRE library. The \fBpcredemo\fP program is provided as a simple coding
-example.
+PCRE library. The 
+.\" HREF
+\fBpcredemo\fP
+.\"
+program is provided as a simple coding example.
 .P
-On some operating systems (e.g. Solaris), when PCRE is not installed in the
-standard library directory, you may get an error like this when you try to run
-\fBpcredemo\fP:
+When you try to run
+.\" HREF
+\fBpcredemo\fP
+.\"
+when PCRE is not installed in the standard library directory, you may get an
+error like this on some operating systems (e.g. Solaris):
 .sp
   ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
 .sp
@@ -75,6 +87,6 @@
 .rs
 .sp
 .nf
-Last updated: 23 January 2008
-Copyright (c) 1997-2008 University of Cambridge.
+Last updated: 01 September 2009
+Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcretest.txt
===================================================================
--- code/trunk/doc/pcretest.txt    2009-08-31 17:10:26 UTC (rev 428)
+++ code/trunk/doc/pcretest.txt    2009-09-01 16:10:16 UTC (rev 429)
@@ -326,8 +326,9 @@
                       or pcre_dfa_exec()
          \Odd       set the size of the output vector passed to
                       pcre_exec() to dd (any number of digits)
-         \P         pass the PCRE_PARTIAL option to pcre_exec()
-                      or pcre_dfa_exec()
+         \P         pass the PCRE_PARTIAL_SOFT option to pcre_exec()
+                      or pcre_dfa_exec(); if used twice, pass the
+                      PCRE_PARTIAL_HARD option
          \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd
                       (any number of digits)
          \R         pass the PCRE_DFA_RESTART option to pcre_dfa_exec()
@@ -413,9 +414,10 @@
        When a match succeeds, pcretest outputs the list of captured substrings
        that  pcre_exec()  returns,  starting with number 0 for the string that
        matched the whole pattern. Otherwise, it outputs "No match" or "Partial
-       match"  when  pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PAR-
-       TIAL, respectively, and otherwise the PCRE negative error number.  Here
-       is an example of an interactive pcretest run.
+       match:"  followed  by the partially matching substring when pcre_exec()
+       returns PCRE_ERROR_NOMATCH  or  PCRE_ERROR_PARTIAL,  respectively,  and
+       otherwise  the  PCRE  negative  error  number. Here is an example of an
+       interactive pcretest run.


          $ pcretest
          PCRE version 7.0 30-Nov-2006
@@ -427,11 +429,11 @@
          data> xyz
          No match


-       Note  that unset capturing substrings that are not followed by one that
-       is set are not returned by pcre_exec(), and are not shown by  pcretest.
-       In  the following example, there are two capturing substrings, but when
-       the first data line is matched, the  second,  unset  substring  is  not
-       shown.  An "internal" unset substring is shown as "<unset>", as for the
+       Note that unset capturing substrings that are not followed by one  that
+       is  set are not returned by pcre_exec(), and are not shown by pcretest.
+       In the following example, there are two capturing substrings, but  when
+       the  first  data  line  is  matched, the second, unset substring is not
+       shown. An "internal" unset substring is shown as "<unset>", as for  the
        second data line.


            re> /(a)|(b)/
@@ -443,11 +445,11 @@
           1: <unset>
           2: b


-       If the strings contain any non-printing characters, they are output  as
-       \0x  escapes,  or  as \x{...} escapes if the /8 modifier was present on
-       the pattern. See below for the definition of  non-printing  characters.
-       If  the pattern has the /+ modifier, the output for substring 0 is fol-
-       lowed by the the rest of the subject string, identified  by  "0+"  like
+       If  the strings contain any non-printing characters, they are output as
+       \0x escapes, or as \x{...} escapes if the /8 modifier  was  present  on
+       the  pattern.  See below for the definition of non-printing characters.
+       If the pattern has the /+ modifier, the output for substring 0 is  fol-
+       lowed  by  the  the rest of the subject string, identified by "0+" like
        this:


            re> /cat/+
@@ -455,7 +457,7 @@
           0: cat
           0+ aract


-       If  the  pattern  has  the /g or /G modifier, the results of successive
+       If the pattern has the /g or /G modifier,  the  results  of  successive
        matching attempts are output in sequence, like this:


            re> /\Bi(\w\w)/g
@@ -469,24 +471,24 @@


        "No match" is output only if the first match attempt fails.


-       If any of the sequences \C, \G, or \L are present in a data  line  that
-       is  successfully  matched,  the substrings extracted by the convenience
+       If  any  of the sequences \C, \G, or \L are present in a data line that
+       is successfully matched, the substrings extracted  by  the  convenience
        functions are output with C, G, or L after the string number instead of
        a colon. This is in addition to the normal full list. The string length
-       (that is, the return from the extraction function) is given  in  paren-
+       (that  is,  the return from the extraction function) is given in paren-
        theses after each string for \C and \G.


        Note that whereas patterns can be continued over several lines (a plain
        ">" prompt is used for continuations), data lines may not. However new-
-       lines  can  be included in data by means of the \n escape (or \r, \r\n,
+       lines can be included in data by means of the \n escape (or  \r,  \r\n,
        etc., depending on the newline sequence setting).



OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION

-       When the alternative matching function, pcre_dfa_exec(),  is  used  (by
-       means  of  the \D escape sequence or the -dfa command line option), the
-       output consists of a list of all the matches that start  at  the  first
+       When  the  alternative  matching function, pcre_dfa_exec(), is used (by
+       means of the \D escape sequence or the -dfa command line  option),  the
+       output  consists  of  a list of all the matches that start at the first
        point in the subject where there is at least one match. For example:


            re> /(tang|tangerine|tan)/
@@ -495,8 +497,10 @@
           1: tang
           2: tan


-       (Using  the  normal  matching function on this data finds only "tang".)
-       The longest matching string is always given first (and numbered zero).
+       (Using the normal matching function on this data  finds  only  "tang".)
+       The  longest matching string is always given first (and numbered zero).
+       After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol-
+       lowed by the partially matching substring.


        If /g is present on the pattern, the search for further matches resumes
        at the end of the longest match. For example:
@@ -510,16 +514,16 @@
           1: tan
           0: tan


-       Since  the  matching  function  does not support substring capture, the
-       escape sequences that are concerned with captured  substrings  are  not
+       Since the matching function does not  support  substring  capture,  the
+       escape  sequences  that  are concerned with captured substrings are not
        relevant.



RESTARTING AFTER A PARTIAL MATCH

        When the alternative matching function has given the PCRE_ERROR_PARTIAL
-       return, indicating that the subject partially matched the pattern,  you
-       can  restart  the match with additional subject data by means of the \R
+       return,  indicating that the subject partially matched the pattern, you
+       can restart the match with additional subject data by means of  the  \R
        escape sequence. For example:


            re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@@ -528,30 +532,30 @@
          data> n05\R\D
           0: n05


-       For further information about partial  matching,  see  the  pcrepartial
+       For  further  information  about  partial matching, see the pcrepartial
        documentation.



CALLOUTS

-       If  the pattern contains any callout requests, pcretest's callout func-
-       tion is called during matching. This works  with  both  matching  func-
+       If the pattern contains any callout requests, pcretest's callout  func-
+       tion  is  called  during  matching. This works with both matching func-
        tions. By default, the called function displays the callout number, the
-       start and current positions in the text at the callout  time,  and  the
+       start  and  current  positions in the text at the callout time, and the
        next pattern item to be tested. For example, the output


          --->pqrabcdef
            0    ^  ^     \d


-       indicates  that  callout number 0 occurred for a match attempt starting
-       at the fourth character of the subject string, when the pointer was  at
-       the  seventh  character of the data, and when the next pattern item was
-       \d. Just one circumflex is output if the start  and  current  positions
+       indicates that callout number 0 occurred for a match  attempt  starting
+       at  the fourth character of the subject string, when the pointer was at
+       the seventh character of the data, and when the next pattern  item  was
+       \d.  Just  one  circumflex is output if the start and current positions
        are the same.


        Callouts numbered 255 are assumed to be automatic callouts, inserted as
-       a result of the /C pattern modifier. In this case, instead  of  showing
-       the  callout  number, the offset in the pattern, preceded by a plus, is
+       a  result  of the /C pattern modifier. In this case, instead of showing
+       the callout number, the offset in the pattern, preceded by a  plus,  is
        output. For example:


            re> /\d?[A-E]\*/C
@@ -563,86 +567,86 @@
          +10 ^ ^
           0: E*


-       The callout function in pcretest returns zero (carry  on  matching)  by
-       default,  but you can use a \C item in a data line (as described above)
+       The  callout  function  in pcretest returns zero (carry on matching) by
+       default, but you can use a \C item in a data line (as described  above)
        to change this.


-       Inserting callouts can be helpful when using pcretest to check  compli-
-       cated  regular expressions. For further information about callouts, see
+       Inserting  callouts can be helpful when using pcretest to check compli-
+       cated regular expressions. For further information about callouts,  see
        the pcrecallout documentation.



NON-PRINTING CHARACTERS

-       When pcretest is outputting text in the compiled version of a  pattern,
-       bytes  other  than 32-126 are always treated as non-printing characters
+       When  pcretest is outputting text in the compiled version of a pattern,
+       bytes other than 32-126 are always treated as  non-printing  characters
        are are therefore shown as hex escapes.


-       When pcretest is outputting text that is a matched part  of  a  subject
-       string,  it behaves in the same way, unless a different locale has been
-       set for the  pattern  (using  the  /L  modifier).  In  this  case,  the
+       When  pcretest  is  outputting text that is a matched part of a subject
+       string, it behaves in the same way, unless a different locale has  been
+       set  for  the  pattern  (using  the  /L  modifier).  In  this case, the
        isprint() function to distinguish printing and non-printing characters.



SAVING AND RELOADING COMPILED PATTERNS

-       The  facilities  described  in  this section are not available when the
+       The facilities described in this section are  not  available  when  the
        POSIX inteface to PCRE is being used, that is, when the /P pattern mod-
        ifier is specified.


        When the POSIX interface is not in use, you can cause pcretest to write
-       a compiled pattern to a file, by following the modifiers with >  and  a
+       a  compiled  pattern to a file, by following the modifiers with > and a
        file name.  For example:


          /pattern/im >/some/file


-       See  the pcreprecompile documentation for a discussion about saving and
+       See the pcreprecompile documentation for a discussion about saving  and
        re-using compiled patterns.


-       The data that is written is binary.  The  first  eight  bytes  are  the
-       length  of  the  compiled  pattern  data  followed by the length of the
-       optional study data, each written as four  bytes  in  big-endian  order
-       (most  significant  byte  first). If there is no study data (either the
+       The  data  that  is  written  is  binary. The first eight bytes are the
+       length of the compiled pattern data  followed  by  the  length  of  the
+       optional  study  data,  each  written as four bytes in big-endian order
+       (most significant byte first). If there is no study  data  (either  the
        pattern was not studied, or studying did not return any data), the sec-
-       ond  length  is  zero. The lengths are followed by an exact copy of the
+       ond length is zero. The lengths are followed by an exact  copy  of  the
        compiled pattern. If there is additional study data, this follows imme-
-       diately  after  the  compiled pattern. After writing the file, pcretest
+       diately after the compiled pattern. After writing  the  file,  pcretest
        expects to read a new pattern.


        A saved pattern can be reloaded into pcretest by specifing < and a file
-       name  instead  of  a pattern. The name of the file must not contain a <
-       character, as otherwise pcretest will interpret the line as  a  pattern
+       name instead of a pattern. The name of the file must not  contain  a  <
+       character,  as  otherwise pcretest will interpret the line as a pattern
        delimited by < characters.  For example:


           re> </some/file
          Compiled regex loaded from /some/file
          No study data


-       When  the pattern has been loaded, pcretest proceeds to read data lines
+       When the pattern has been loaded, pcretest proceeds to read data  lines
        in the usual way.


-       You can copy a file written by pcretest to a different host and  reload
-       it  there,  even  if the new host has opposite endianness to the one on
-       which the pattern was compiled. For example, you can compile on an  i86
+       You  can copy a file written by pcretest to a different host and reload
+       it there, even if the new host has opposite endianness to  the  one  on
+       which  the pattern was compiled. For example, you can compile on an i86
        machine and run on a SPARC machine.


-       File  names  for  saving and reloading can be absolute or relative, but
-       note that the shell facility of expanding a file name that starts  with
+       File names for saving and reloading can be absolute  or  relative,  but
+       note  that the shell facility of expanding a file name that starts with
        a tilde (~) is not available.


-       The  ability to save and reload files in pcretest is intended for test-
-       ing and experimentation. It is not intended for production use  because
-       only  a  single pattern can be written to a file. Furthermore, there is
-       no facility for supplying  custom  character  tables  for  use  with  a
-       reloaded  pattern.  If  the  original  pattern was compiled with custom
-       tables, an attempt to match a subject string using a  reloaded  pattern
-       is  likely to cause pcretest to crash.  Finally, if you attempt to load
+       The ability to save and reload files in pcretest is intended for  test-
+       ing  and experimentation. It is not intended for production use because
+       only a single pattern can be written to a file. Furthermore,  there  is
+       no  facility  for  supplying  custom  character  tables  for use with a
+       reloaded pattern. If the original  pattern  was  compiled  with  custom
+       tables,  an  attempt to match a subject string using a reloaded pattern
+       is likely to cause pcretest to crash.  Finally, if you attempt to  load
        a file that is not in the correct format, the result is undefined.



SEE ALSO

-       pcre(3), pcreapi(3), pcrecallout(3),  pcrematching(3),  pcrepartial(d),
+       pcre(3),  pcreapi(3),  pcrecallout(3), pcrematching(3), pcrepartial(d),
        pcrepattern(3), pcreprecompile(3).



@@ -655,5 +659,5 @@

REVISION

-       Last updated: 10 March 2009
+       Last updated: 29 August 2009
        Copyright (c) 1997-2009 University of Cambridge.