[Pcre-svn] [456] code/trunk: Documentation update

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [456] code/trunk: Documentation update
Revision: 456
          http://vcs.pcre.org/viewvc?view=rev&revision=456
Author:   ph10
Date:     2009-10-02 09:53:31 +0100 (Fri, 02 Oct 2009)


Log Message:
-----------
Documentation update

Modified Paths:
--------------
    code/trunk/HACKING
    code/trunk/doc/pcre.3
    code/trunk/doc/pcre_compile.3
    code/trunk/doc/pcre_compile2.3
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrebuild.3
    code/trunk/doc/pcrecallout.3
    code/trunk/doc/pcrecompat.3
    code/trunk/doc/pcrematching.3
    code/trunk/doc/pcrepartial.3
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcresample.3
    code/trunk/doc/perltest.txt


Modified: code/trunk/HACKING
===================================================================
--- code/trunk/HACKING    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/HACKING    2009-10-02 08:53:31 UTC (rev 456)
@@ -67,22 +67,22 @@
 functions to work this way. This got rid of about 600 lines of source. It
 should make future maintenance and development easier. As this was such a major 
 change, I never released 6.8, instead upping the number to 7.0 (other quite 
-major changes are also present in the 7.0 release).
+major changes were also present in the 7.0 release).


-A side effect of this work is that the previous limit of 200 on the nesting
+A side effect of this work was that the previous limit of 200 on the nesting
depth of parentheses was removed. However, there is a downside: pcre_compile()
runs more slowly than before (30% or more, depending on the pattern) because it
-is doing a full analysis of the pattern. My hope is that this is not a big
-issue.
+is doing a full analysis of the pattern. My hope was that this would not be a
+big issue, and in the event, nobody has commented on it.

Traditional matching function
-----------------------------

The "traditional", and original, matching function is called pcre_exec(), and
it implements an NFA algorithm, similar to the original Henry Spencer algorithm
-and the way that Perl works. Not surprising, since it is intended to be as
-compatible with Perl as possible. This is the function most users of PCRE will
-use most of the time.
+and the way that Perl works. This is not surprising, since it is intended to be
+as compatible with Perl as possible. This is the function most users of PCRE
+will use most of the time.

Supplementary matching function
-------------------------------
@@ -119,6 +119,7 @@

A list of the opcodes follows:

+
Opcodes with no following data
------------------------------

@@ -150,12 +151,12 @@
   OP_EXTUNI              match an extended Unicode character 
   OP_ANYNL               match any Unicode newline sequence 


-  OP_ACCEPT              )
-  OP_COMMIT              ) 
-  OP_FAIL                ) These are Perl 5.10's "backtracking     
-  OP_PRUNE               ) control verbs".                         
-  OP_SKIP                )
-  OP_THEN                )
+  OP_ACCEPT              ) These are Perl 5.10's "backtracking    
+  OP_COMMIT              ) control verbs". If OP_ACCEPT is inside
+  OP_FAIL                ) capturing parentheses, it may be preceded 
+  OP_PRUNE               ) by one or more OP_CLOSE, followed by a 2-byte 
+  OP_SKIP                ) number, indicating which parentheses must be
+  OP_THEN                ) closed.



Repeating single characters
@@ -415,4 +416,4 @@
data.

Philip Hazel
-April 2008
+October 2009

Modified: code/trunk/doc/pcre.3
===================================================================
--- code/trunk/doc/pcre.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcre.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -6,21 +6,20 @@
 .sp
 The PCRE library is a set of functions that implement regular expression
 pattern matching using the same syntax and semantics as Perl, with just a few
-differences. Certain features that appeared in Python and PCRE before they
-appeared in Perl are also available using the Python syntax. There is also some
-support for certain .NET and Oniguruma syntax items, and there is an option for
-requesting some minor changes that give better JavaScript compatibility.
+differences. Some features that appeared in Python and PCRE before they
+appeared in Perl are also available using the Python syntax, there is some
+support for one or two .NET and Oniguruma syntax items, and there is an option
+for requesting some minor changes that give better JavaScript compatibility.
 .P
-The current implementation of PCRE (release 8.xx) corresponds approximately
-with Perl 5.10, including support for UTF-8 encoded strings and Unicode general
-category properties. However, UTF-8 and Unicode support has to be explicitly
-enabled; it is not the default. The Unicode tables correspond to Unicode
-release 5.1.
+The current implementation of PCRE corresponds approximately with Perl 5.10,
+including support for UTF-8 encoded strings and Unicode general category
+properties. However, UTF-8 and Unicode support has to be explicitly enabled; it
+is not the default. The Unicode tables correspond to Unicode release 5.1.
 .P
 In addition to the Perl-compatible matching function, PCRE contains an
-alternative matching function that matches the same compiled patterns in a
-different way. In certain circumstances, the alternative function has some
-advantages. For a discussion of the two matching algorithms, see the
+alternative function that matches the same compiled patterns in a different
+way. In certain circumstances, the alternative function has some advantages.
+For a discussion of the two matching algorithms, see the
 .\" HREF
 \fBpcrematching\fP
 .\"
@@ -66,7 +65,8 @@
 \fBpcrebuild\fP
 .\"
 page. Documentation about building PCRE for various operating systems can be
-found in the \fBREADME\fP file in the source distribution.
+found in the \fBREADME\fP and \fBNON-UNIX-USE\fP files in the source
+distribution.
 .P
 The library contains a number of undocumented internal functions and data
 tables that are used by more than one of the exported external functions, but
@@ -100,12 +100,12 @@
 .\" JOIN
   pcrepattern       syntax and semantics of supported
                       regular expressions
-  pcresyntax        quick syntax reference
   pcreperform       discussion of performance issues
   pcreposix         the POSIX-compatible C API
   pcreprecompile    details of saving and re-using precompiled patterns
   pcresample        discussion of the pcredemo program
   pcrestack         discussion of stack usage
+  pcresyntax        quick syntax reference
   pcretest          description of the \fBpcretest\fP testing command
 .sp
 In addition, in the "man" and HTML formats, there is a short page for each
@@ -148,9 +148,9 @@
 .\"
 documentation.
 .
+.
 .\" HTML <a name="utf8support"></a>
 .
-.
 .SH "UTF-8 AND UNICODE PROPERTY SUPPORT"
 .rs
 .sp
@@ -167,7 +167,7 @@
 with the PCRE_UTF8 option flag, or the pattern must start with the sequence
 (*UTF8). When either of these is the case, both the pattern and any subject
 strings that are matched against it are treated as UTF-8 strings instead of
-just strings of bytes.
+strings of 1-byte characters.
 .P
 If you compile PCRE with UTF-8 support, but do not use it at run time, the
 library will be a bit bigger, but the additional run time overhead is limited
@@ -187,6 +187,7 @@
 Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
 compatibility with Perl 5.6. PCRE does not support this.
 .
+.
 .\" HTML <a name="utf8strings"></a>
 .
 .SS "Validity of UTF-8 strings"
@@ -292,6 +293,6 @@
 .rs
 .sp
 .nf
-Last updated: 01 September 2009
+Last updated: 28 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre_compile.3
===================================================================
--- code/trunk/doc/pcre_compile.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcre_compile.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -52,11 +52,11 @@
   PCRE_NEWLINE_LF         Set LF as the newline sequence
   PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
                             theses (named ones available)
-  PCRE_UNGREEDY           Invert greediness of quantifiers
-  PCRE_UTF8               Run in UTF-8 mode
   PCRE_NO_UTF8_CHECK      Do not check the pattern for UTF-8
                             validity (only relevant if
                             PCRE_UTF8 is set)
+  PCRE_UNGREEDY           Invert greediness of quantifiers
+  PCRE_UTF8               Run in UTF-8 mode
 .sp
 PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
 PCRE_NO_UTF8_CHECK.


Modified: code/trunk/doc/pcre_compile2.3
===================================================================
--- code/trunk/doc/pcre_compile2.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcre_compile2.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -34,29 +34,33 @@
 .sp
 The option bits are:
 .sp
-  PCRE_ANCHORED         Force pattern anchoring
-  PCRE_AUTO_CALLOUT     Compile automatic callouts
-  PCRE_CASELESS         Do caseless matching
-  PCRE_DOLLAR_ENDONLY   $ not to match newline at end
-  PCRE_DOTALL           . matches anything including NL
-  PCRE_DUPNAMES         Allow duplicate names for subpatterns
-  PCRE_EXTENDED         Ignore whitespace and # comments
-  PCRE_EXTRA            PCRE extra features
-                          (not much use currently)
-  PCRE_FIRSTLINE        Force matching to be before newline
-  PCRE_MULTILINE        ^ and $ match newlines within data
-  PCRE_NEWLINE_ANY      Recognize any Unicode newline sequence
-  PCRE_NEWLINE_ANYCRLF  Recognize CR, LF, and CRLF as newline sequences
-  PCRE_NEWLINE_CR       Set CR as the newline sequence
-  PCRE_NEWLINE_CRLF     Set CRLF as the newline sequence
-  PCRE_NEWLINE_LF       Set LF as the newline sequence
-  PCRE_NO_AUTO_CAPTURE  Disable numbered capturing paren-
-                          theses (named ones available)
-  PCRE_UNGREEDY         Invert greediness of quantifiers
-  PCRE_UTF8             Run in UTF-8 mode
-  PCRE_NO_UTF8_CHECK    Do not check the pattern for UTF-8
-                          validity (only relevant if
-                          PCRE_UTF8 is set)
+  PCRE_ANCHORED           Force pattern anchoring
+  PCRE_AUTO_CALLOUT       Compile automatic callouts
+  PCRE_BSR_ANYCRLF        \eR matches only CR, LF, or CRLF
+  PCRE_BSR_UNICODE        \eR matches all Unicode line endings
+  PCRE_CASELESS           Do caseless matching
+  PCRE_DOLLAR_ENDONLY     $ not to match newline at end
+  PCRE_DOTALL             . matches anything including NL
+  PCRE_DUPNAMES           Allow duplicate names for subpatterns
+  PCRE_EXTENDED           Ignore whitespace and # comments
+  PCRE_EXTRA              PCRE extra features
+                            (not much use currently)
+  PCRE_FIRSTLINE          Force matching to be before newline
+  PCRE_JAVASCRIPT_COMPAT  JavaScript compatibility
+  PCRE_MULTILINE          ^ and $ match newlines within data
+  PCRE_NEWLINE_ANY        Recognize any Unicode newline sequence
+  PCRE_NEWLINE_ANYCRLF    Recognize CR, LF, and CRLF as newline 
+                            sequences
+  PCRE_NEWLINE_CR         Set CR as the newline sequence
+  PCRE_NEWLINE_CRLF       Set CRLF as the newline sequence
+  PCRE_NEWLINE_LF         Set LF as the newline sequence
+  PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
+                            theses (named ones available)
+  PCRE_NO_UTF8_CHECK      Do not check the pattern for UTF-8
+                            validity (only relevant if
+                            PCRE_UTF8 is set)
+  PCRE_UNGREEDY           Invert greediness of quantifiers
+  PCRE_UTF8               Run in UTF-8 mode
 .sp
 PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
 PCRE_NO_UTF8_CHECK.


Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcreapi.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -395,7 +395,9 @@
 Either of the functions \fBpcre_compile()\fP or \fBpcre_compile2()\fP can be
 called to compile a pattern into an internal form. The only difference between
 the two interfaces is that \fBpcre_compile2()\fP has an additional argument,
-\fIerrorcodeptr\fP, via which a numerical error code can be returned.
+\fIerrorcodeptr\fP, via which a numerical error code can be returned. To avoid 
+too much repetition, we refer just to \fBpcre_compile()\fP below, but the 
+information applies equally to \fBpcre_compile2()\fP.
 .P
 The pattern is a C string terminated by a binary zero, and is passed in the
 \fIpattern\fP argument. A pointer to a single block of memory that is obtained
@@ -412,23 +414,23 @@
 The \fIoptions\fP argument contains various bit settings that affect the
 compilation. It should be zero if no options are required. The available
 options are described below. Some of them (in particular, those that are
-compatible with Perl, but also some others) can also be set and unset from
+compatible with Perl, but some others as well) can also be set and unset from
 within the pattern (see the detailed description in the
 .\" HREF
 \fBpcrepattern\fP
 .\"
 documentation). For those options that can be different in different parts of
-the pattern, the contents of the \fIoptions\fP argument specifies their initial
-settings at the start of compilation and execution. The PCRE_ANCHORED and
-PCRE_NEWLINE_\fIxxx\fP options can be set at the time of matching as well as at
-compile time.
+the pattern, the contents of the \fIoptions\fP argument specifies their
+settings at the start of compilation and execution. The PCRE_ANCHORED, 
+PCRE_BSR_\fIxxx\fP, and PCRE_NEWLINE_\fIxxx\fP options can be set at the time
+of matching as well as at compile time.
 .P
 If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
 Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns
 NULL, and sets the variable pointed to by \fIerrptr\fP to point to a textual
 error message. This is a static string that is part of the library. You must
 not try to free it. The byte offset from the start of the pattern to the
-character that was being processes when the error was discovered is placed in
+character that was being processed when the error was discovered is placed in
 the variable pointed to by \fIerroffset\fP, which must not be NULL. If it is,
 an immediate error is given. Some errors are not detected until checks are
 carried out when the whole pattern has been scanned; in this case the offset is
@@ -984,7 +986,7 @@
 .sp
 If the pattern was studied and a minimum length for matching subject strings
 was computed, its value is returned. Otherwise the returned value is -1. The
-value is a number of characters, not bytes (there may be a difference in UTF-8
+value is a number of characters, not bytes (this may be relevant in UTF-8
 mode). The fourth argument should point to an \fBint\fP variable. A
 non-negative value is a lower bound to the length of any matching string. There
 may not be any strings of that length that do actually match, but every string
@@ -1209,7 +1211,7 @@
 The \fImatch_limit\fP field provides a means of preventing PCRE from using up a
 vast amount of resources when running patterns that are not going to match,
 but which have a very large number of possibilities in their search trees. The
-classic example is the use of nested unlimited repeats.
+classic example is a pattern that uses nested unlimited repeats.
 .P
 Internally, PCRE uses a function called \fBmatch()\fP which it calls repeatedly
 (sometimes recursively). The limit set by \fImatch_limit\fP is imposed on the
@@ -1508,7 +1510,7 @@
 has to get additional memory for use during matching. Thus it is usually
 advisable to supply an \fIovector\fP.
 .P
-The \fBpcre_info()\fP function can be used to find out how many capturing
+The \fBpcre_fullinfo()\fP function can be used to find out how many capturing
 subpatterns there are in a compiled pattern. The smallest size for
 \fIovector\fP that will allow for \fIn\fP captured substrings, in addition to
 the offsets of the substring matched by the whole pattern, is (\fIn\fP+1)*3.
@@ -2043,6 +2045,6 @@
 .rs
 .sp
 .nf
-Last updated: 26 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrebuild.3
===================================================================
--- code/trunk/doc/pcrebuild.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcrebuild.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -1,6 +1,8 @@
 .TH PCREBUILD 3
 .SH NAME
 PCRE - Perl-compatible regular expressions
+.
+.
 .SH "PCRE BUILD-TIME OPTIONS"
 .rs
 .sp
@@ -29,6 +31,7 @@
 --enable and --disable always come in pairs, so the complementary option always
 exists as well, but as it specifies the default, it is not described.
 .
+.
 .SH "C++ SUPPORT"
 .rs
 .sp
@@ -40,6 +43,7 @@
 .sp
 to the \fBconfigure\fP command.
 .
+.
 .SH "UTF-8 SUPPORT"
 .rs
 .sp
@@ -50,7 +54,7 @@
 to the \fBconfigure\fP command. Of itself, this does not make PCRE treat
 strings as UTF-8. As well as compiling PCRE with this option, you also have
 have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fP
-function.
+or \fBpcre_compile2()\fP functions.
 .P
 If you set --enable-utf8 when compiling in an EBCDIC environment, PCRE expects
 its input to be either ASCII or UTF-8 (depending on the runtime option). It is
@@ -58,6 +62,7 @@
 library. Consequently, --enable-utf8 and --enable-ebcdic are mutually
 exclusive.
 .
+.
 .SH "UNICODE CHARACTER PROPERTY SUPPORT"
 .rs
 .sp
@@ -80,6 +85,7 @@
 .\"
 documentation.
 .
+.
 .SH "CODE VALUE OF NEWLINE"
 .rs
 .sp
@@ -112,6 +118,7 @@
 overridden when the library functions are called. At build time it is
 conventional to use the standard for your operating system.
 .
+.
 .SH "WHAT \eR MATCHES"
 .rs
 .sp
@@ -124,6 +131,7 @@
 selected when PCRE is built can be overridden when the library functions are
 called.
 .
+.
 .SH "BUILDING SHARED AND STATIC LIBRARIES"
 .rs
 .sp
@@ -135,6 +143,7 @@
 .sp
 to the \fBconfigure\fP command, as required.
 .
+.
 .SH "POSIX MALLOC USAGE"
 .rs
 .sp
@@ -154,6 +163,7 @@
 .sp
 to the \fBconfigure\fP command.
 .
+.
 .SH "HANDLING VERY LARGE PATTERNS"
 .rs
 .sp
@@ -162,8 +172,8 @@
 metacharacter). By default, two-byte values are used for these offsets, leading
 to a maximum size for a compiled pattern of around 64K. This is sufficient to
 handle all but the most gigantic patterns. Nevertheless, some people do want to
-process enormous patterns, so it is possible to compile PCRE to use three-byte
-or four-byte offsets by adding a setting such as
+process truyl enormous patterns, so it is possible to compile PCRE to use
+three-byte or four-byte offsets by adding a setting such as
 .sp
   --with-link-size=3
 .sp
@@ -171,6 +181,7 @@
 longer offsets slows down the operation of PCRE because it has to load
 additional bytes when handling them.
 .
+.
 .SH "AVOIDING EXCESSIVE STACK USAGE"
 .rs
 .sp
@@ -194,7 +205,7 @@
 \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory
 management functions. By default these point to \fBmalloc()\fP and
 \fBfree()\fP, but you can replace the pointers so that your own functions are
-used.
+used instead.
 .P
 Separate functions are provided rather than using \fBpcre_malloc\fP and
 \fBpcre_free\fP because the usage is very predictable: the block sizes
@@ -202,8 +213,9 @@
 order. A calling program might be able to implement optimized functions that
 perform better than \fBmalloc()\fP and \fBfree()\fP. PCRE runs noticeably more
 slowly when built in this way. This option affects only the \fBpcre_exec()\fP
-function; it is not relevant for the the \fBpcre_dfa_exec()\fP function.
+function; it is not relevant for \fBpcre_dfa_exec()\fP.
 .
+.
 .SH "LIMITING PCRE RESOURCE USAGE"
 .rs
 .sp
@@ -235,6 +247,7 @@
 .sp
 to the \fBconfigure\fP command. This value can also be overridden at run time.
 .
+.
 .SH "CREATING CHARACTER TABLES AT BUILD TIME"
 .rs
 .sp
@@ -253,6 +266,7 @@
 create alternative tables when cross compiling, you will have to do so "by
 hand".)
 .
+.
 .SH "USING EBCDIC CODE"
 .rs
 .sp
@@ -268,6 +282,7 @@
 an EBCDIC environment (for example, an IBM mainframe operating system). The
 --enable-ebcdic option is incompatible with --enable-utf8.
 .
+.
 .SH "PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT"
 .rs
 .sp
@@ -282,6 +297,7 @@
 relevant libraries are installed on your system. Configuration will fail if
 they are not.
 .
+.
 .SH "PCRETEST OPTION FOR LIBREADLINE SUPPORT"
 .rs
 .sp
@@ -292,7 +308,7 @@
 to the \fBconfigure\fP command, \fBpcretest\fP is linked with the
 \fBlibreadline\fP library, and when its input is from a terminal, it reads it
 using the \fBreadline()\fP function. This provides line-editing and history
-facilities. Note that \fBlibreadline\fP is GPL-licenced, so if you distribute a
+facilities. Note that \fBlibreadline\fP is GPL-licensed, so if you distribute a
 binary of \fBpcretest\fP linked in this way, there may be licensing issues.
 .P
 Setting this option causes the \fB-lreadline\fP option to be added to the
@@ -334,6 +350,6 @@
 .rs
 .sp
 .nf
-Last updated: 06 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrecallout.3
===================================================================
--- code/trunk/doc/pcrecallout.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcrecallout.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -19,9 +19,10 @@
 .sp
   (?C1)abc(?C2)def
 .sp
-If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP is called,
-PCRE automatically inserts callouts, all with number 255, before each item in
-the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
+If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP or 
+\fBpcre_compile2()\fP is called, PCRE automatically inserts callouts, all with
+number 255, before each item in the pattern. For example, if PCRE_AUTO_CALLOUT
+is used with the pattern
 .sp
   A(\ed{2}|--)
 .sp
@@ -54,6 +55,11 @@
 the callout is never reached. However, with "abyd", though the result is still
 no match, the callout is obeyed.
 .P
+If the pattern is studied, PCRE knows the minimum length of a matching string,
+and will immediately give a "no match" return without actually running a match
+if the subject is not long enough, or, for unanchored patterns, if it has
+been scanned far enough.
+.P
 You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE
 option to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. This slows down the
 matching process, but does ensure that callouts such as the example above are
@@ -155,7 +161,7 @@
 matching proceeds as normal. If the value is greater than zero, matching fails
 at the current point, but the testing of other matching possibilities goes
 ahead, just as if a lookahead assertion had failed. If the value is less than
-zero, the match is abandoned, and \fBpcre_exec()\fP (or \fBpcre_dfa_exec()\fP)
+zero, the match is abandoned, and \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP
 returns the negative value.
 .P
 Negative values should normally be chosen from the set of PCRE_ERROR_xxx
@@ -178,6 +184,6 @@
 .rs
 .sp
 .nf
-Last updated: 15 March 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrecompat.3
===================================================================
--- code/trunk/doc/pcrecompat.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcrecompat.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -5,9 +5,8 @@
 .rs
 .sp
 This document describes the differences in the ways that PCRE and Perl handle
-regular expressions. The differences described here are mainly with respect to
-Perl 5.8, though PCRE versions 7.0 and later contain some features that are
-in Perl 5.10.
+regular expressions. The differences described here are with respect to Perl
+5.10.
 .P
 1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
 it does have are given in the
@@ -86,7 +85,7 @@
 .\"
 in the
 .\" HREF
-\fBpcrecompat\fP
+\fBpcrepattern\fP
 .\"
 page.
 .P
@@ -98,15 +97,31 @@
 (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in the forms without an
 argument. PCRE does not support (*MARK).
 .P
-12. PCRE provides some extensions to the Perl regular expression facilities.
-Perl 5.10 will include new features that are not in earlier versions, some of
-which (such as named parentheses) have been in PCRE for some time. This list is
-with respect to Perl 5.10:
+12. PCRE's handling of duplicate subpattern numbers and duplicate subpattern 
+names is not as general as Perl's. This is a consequence of the fact the PCRE 
+works internally just with numbers, using an external table to translate 
+between numbers and names. The following are some specific differences:
 .sp
-(a) Although lookbehind assertions must match fixed length strings, each
-alternative branch of a lookbehind assertion can match a different length of
-string. Perl requires them all to have the same length.
+(a) After matching a pattern such as (?|(?<a>A)|(?<b)B) where the two capturing 
+parentheses have the same number but different names, it is not possible to 
+distinguish which parentheses matched, because both names map to capturing
+subpattern number 1.
 .sp
+(b) A condition test for a subpattern with a name that is duplicated gives
+unpredictable results. For example, when the pattern
+(?:(?<a>A)|(?<a>B))(?('a')...|...) is compiled (the PCRE_DUPNAMES option is
+required), the condition test (?('a') is set to test whether subpattern 1 has
+matched, ignoring subpattern 2, even though it has the same name.
+.P
+13. PCRE provides some extensions to the Perl regular expression facilities.
+Perl 5.10 includes new features that are not in earlier versions of Perl, some
+of which (such as named parentheses) have been in PCRE for some time. This list
+is with respect to Perl 5.10:
+.sp
+(a) Although lookbehind assertions in PCRE must match fixed length strings,
+each alternative branch of a lookbehind assertion can match a different length
+of string. Perl requires them all to have the same length.
+.sp
 (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
 meta-character matches only at the very end of the string.
 .sp
@@ -155,6 +170,6 @@
 .rs
 .sp
 .nf
-Last updated: 18 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrematching.3
===================================================================
--- code/trunk/doc/pcrematching.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcrematching.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -74,13 +74,17 @@
 traditional finite state machine (it keeps multiple states active
 simultaneously).
 .P
+Although the general principle of this matching algorithm is that it scans the 
+subject string only once, without backtracking, there is one exception: when a 
+lookaround assertion is encountered, the characters following or preceding the 
+current point have to be independently inspected.
+.P
 The scan continues until either the end of the subject is reached, or there are
 no more unterminated paths. At this point, terminated paths represent the
 different matching possibilities (if there are none, the match has failed).
 Thus, if there is more than one possible match, this algorithm finds all of
-them, and in particular, it finds the longest. In PCRE, there is an option to
-stop the algorithm after the first match (which is necessarily the shortest)
-has been found.
+them, and in particular, it finds the longest. There is an option to stop the
+algorithm after the first match (which is necessarily the shortest) is found.
 .P
 Note that all the matches that are found start at the same point in the
 subject. If the pattern
@@ -92,11 +96,6 @@
 character of the subject. The algorithm does not automatically move on to find
 matches that start at later positions.
 .P
-Although the general principle of this matching algorithm is that it scans the 
-subject string only once, without backtracking, there is one exception: when a 
-lookbehind assertion is encountered, the preceding characters have to be
-re-inspected.
-.P
 There are a number of features of PCRE regular expressions that are not
 supported by the alternative matching algorithm. They are as follows:
 .P
@@ -152,8 +151,13 @@
 2. Because the alternative algorithm scans the subject string just once, and
 never needs to backtrack, it is possible to pass very long subject strings to
 the matching function in several pieces, checking for partial matching each
-time.
+time. The
+.\" HREF                                                                
+\fBpcrepartial\fP
+.\"                                                           
+documentation gives details of partial matching.
 .
+.
 .SH "DISADVANTAGES OF THE ALTERNATIVE ALGORITHM"
 .rs
 .sp
@@ -183,6 +187,6 @@
 .rs
 .sp
 .nf
-Last updated: 05 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrepartial.3
===================================================================
--- code/trunk/doc/pcrepartial.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcrepartial.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -32,10 +32,13 @@
 though the details differ between the two matching functions. If both options
 are set, PCRE_PARTIAL_HARD takes precedence.
 .P
-Setting a partial matching option disables one of PCRE's optimizations. PCRE
+Setting a partial matching option disables two of PCRE's optimizations. PCRE
 remembers the last literal byte in a pattern, and abandons matching immediately
 if such a byte is not present in the subject string. This optimization cannot
-be used for a subject string that might match only partially.
+be used for a subject string that might match only partially. If the pattern 
+was studied, PCRE knows the minimum length of a matching string, and does not 
+bother to run the matching function on shorter strings. This optimization is 
+also disabled for partial matching.
 .
 .
 .SH "PARTIAL MATCHING USING pcre_exec()"
@@ -53,7 +56,7 @@
 vector, the first of them is set to the offset of the earliest character that
 was inspected when the partial match was found. For convenience, the second
 offset points to the end of the string so that a substring can easily be
-extracted.
+identified.
 .P
 For the majority of patterns, the first offset identifies the start of the
 partially matched string. However, for patterns that contain lookbehind
@@ -358,6 +361,6 @@
 .rs
 .sp
 .nf
-Last updated: 05 September 2009
+Last updated: 29 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcrepattern.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -21,10 +21,10 @@
 description of PCRE's regular expressions is intended as reference material.
 .P
 The original operation of PCRE was on strings of one-byte characters. However,
-there is now also support for UTF-8 character strings. To use this, you must
-build PCRE to include UTF-8 support, and then call \fBpcre_compile()\fP with
-the PCRE_UTF8 option. There is also a special sequence that can be given at the
-start of a pattern:
+there is now also support for UTF-8 character strings. To use this, 
+PCRE must be built to include UTF-8 support, and you must call
+\fBpcre_compile()\fP or \fBpcre_compile2()\fP with the PCRE_UTF8 option. There
+is also a special sequence that can be given at the start of a pattern:
 .sp
   (*UTF8)
 .sp
@@ -83,8 +83,9 @@
   (*ANYCRLF)   any of the three above
   (*ANY)       all Unicode newline sequences
 .sp
-These override the default and the options given to \fBpcre_compile()\fP. For
-example, on a Unix system where LF is the default newline sequence, the pattern
+These override the default and the options given to \fBpcre_compile()\fP or 
+\fBpcre_compile2()\fP. For example, on a Unix system where LF is the default
+newline sequence, the pattern
 .sp
   (*CR)a.b
 .sp
@@ -206,9 +207,8 @@
 A second use of backslash provides a way of encoding non-printing characters
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters, apart from the binary zero that terminates a pattern,
-but when a pattern is being prepared by text editing, it is usually easier to
-use one of the following escape sequences than the binary character it
-represents:
+but when a pattern is being prepared by text editing, it is often easier to use
+one of the following escape sequences than the binary character it represents:
 .sp
   \ea        alarm, that is, the BEL character (hex 07)
   \ecx       "control-x", where x is any character
@@ -468,12 +468,13 @@
   (*BSR_ANYCRLF)   CR, LF, or CRLF only
   (*BSR_UNICODE)   any Unicode newline sequence
 .sp
-These override the default and the options given to \fBpcre_compile()\fP, but
-they can be overridden by options given to \fBpcre_exec()\fP. Note that these
-special settings, which are not Perl-compatible, are recognized only at the
-very start of a pattern, and that they must be in upper case. If more than one
-of them is present, the last one is used. They can be combined with a change of
-newline convention, for example, a pattern can start with:
+These override the default and the options given to \fBpcre_compile()\fP or 
+\fBpcre_compile2()\fP, but they can be overridden by options given to
+\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. Note that these special settings,
+which are not Perl-compatible, are recognized only at the very start of a
+pattern, and that they must be in upper case. If more than one of them is
+present, the last one is used. They can be combined with a change of newline
+convention, for example, a pattern can start with:
 .sp
   (*ANY)(*BSR_ANYCRLF)
 .sp
@@ -740,7 +741,10 @@
 A word boundary is a position in the subject string where the current character
 and the previous character do not both match \ew or \eW (i.e. one matches
 \ew and the other matches \eW), or the start or end of the string if the
-first or last character matches \ew, respectively.
+first or last character matches \ew, respectively. Neither PCRE nor Perl has a 
+separte "start of word" or "end of word" metasequence. However, whatever 
+follows \eb normally determines which it is. For example, the fragment 
+\eba matches "a" at the start of a word.
 .P
 The \eA, \eZ, and \ez assertions differ from the traditional circumflex and
 dollar (described in the next section) in that they only ever match at the very
@@ -872,14 +876,15 @@
 .rs
 .sp
 An opening square bracket introduces a character class, terminated by a closing
-square bracket. A closing square bracket on its own is not special. If a
-closing square bracket is required as a member of the class, it should be the
-first data character in the class (after an initial circumflex, if present) or
-escaped with a backslash.
+square bracket. A closing square bracket on its own is not special by default. 
+However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square 
+bracket causes a compile-time error. If a closing square bracket is required as
+a member of the class, it should be the first data character in the class
+(after an initial circumflex, if present) or escaped with a backslash.
 .P
 A character class matches a single character in the subject. In UTF-8 mode, the
-character may occupy more than one byte. A matched character must be in the set
-of characters defined by the class, unless the first character in the class
+character may be more than one byte long. A matched character must be in the
+set of characters defined by the class, unless the first character in the class
 definition is a circumflex, in which case the subject character must not be in
 the set defined by the class. If a circumflex is actually required as a member
 of the class, ensure it is not the first character, or escape it with a
@@ -889,7 +894,7 @@
 [^aeiou] matches any character that is not a lower case vowel. Note that a
 circumflex is just a convenient notation for specifying the characters that
 are in the class by enumerating those that are not. A class that starts with a
-circumflex is not an assertion: it still consumes a character from the subject
+circumflex is not an assertion; it still consumes a character from the subject
 string, and therefore it fails if the current pointer is at the end of the
 string.
 .P
@@ -903,9 +908,9 @@
 case for characters whose values are less than 128, so caseless matching is
 always possible. For characters with higher values, the concept of case is
 supported if PCRE is compiled with Unicode property support, but not otherwise.
-If you want to use caseless matching for characters 128 and above, you must
-ensure that PCRE is compiled with Unicode property support as well as with
-UTF-8 support.
+If you want to use caseless matching in UTF8-mode for characters 128 and above,
+you must ensure that PCRE is compiled with Unicode property support as well as
+with UTF-8 support.
 .P
 Characters that might indicate line breaks are never treated in any special way
 when matching character classes, whatever line-ending sequence is in use, and
@@ -1132,6 +1137,7 @@
 the above patterns match "SUNDAY" as well as "Saturday".
 .
 .
+.\" HTML <a name="dupsubpatternnumber"></a>
 .SH "DUPLICATE SUBPATTERN NUMBERS"
 .rs
 .sp
@@ -1157,10 +1163,20 @@
   / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
   # 1            2         2  3        2     3     4
 .sp
-A backreference or a recursive call to a numbered subpattern always refers to
-the first one in the pattern with the given number.
+A backreference to a numbered subpattern uses the most recent value that is set 
+for that number by any subpattern. The following pattern matches "abcabc" or
+"defdef":
+.sp
+  /(?|(abc)|(def))\1/
+.sp
+In contrast, a recursive or "subroutine" call to a numbered subpattern always
+refers to the first one in the pattern with the given number. The following 
+pattern matches "abcabc" or "defabc":
+.sp
+  /(?|(abc)|(def))(?1)/
+.sp
 .P
-An alternative approach to using this "branch reset" feature is to use
+An alternative approach to using the "branch reset" feature is to use
 duplicate named subpatterns, as described in the next section.
 .
 .
@@ -1247,6 +1263,7 @@
   a character class
   a back reference (see next section)
   a parenthesized subpattern (unless it is an assertion)
+  a recursive or "subroutine" call to a subpattern 
 .sp
 The general repetition quantifier specifies a minimum and maximum number of
 permitted matches, by giving the two numbers in curly brackets (braces),
@@ -1568,16 +1585,19 @@
 .P
 There may be more than one back reference to the same subpattern. If a
 subpattern has not actually been used in a particular match, any back
-references to it always fail. For example, the pattern
+references to it always fail by default. For example, the pattern
 .sp
   (a|(bc))\e2
 .sp
-always fails if it starts to match "a" rather than "bc". Because there may be
-many capturing parentheses in a pattern, all digits following the backslash are
-taken as part of a potential back reference number. If the pattern continues
-with a digit character, some delimiter must be used to terminate the back
-reference. If the PCRE_EXTENDED option is set, this can be whitespace.
-Otherwise an empty comment (see
+always fails if it starts to match "a" rather than "bc". However, if the 
+PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back reference to an 
+unset value matches an empty string.
+.P
+Because there may be many capturing parentheses in a pattern, all digits
+following a backslash are taken as part of a potential back reference number.
+If the pattern continues with a digit character, some delimiter must be used to
+terminate the back reference. If the PCRE_EXTENDED option is set, this can be
+whitespace. Otherwise, the \eg{ syntax or an empty comment (see
 .\" HTML <a href="#comments">
 .\" </a>
 "Comments"
@@ -1650,6 +1670,8 @@
 If you want to force a matching failure at some point in a pattern, the most
 convenient way to do it is with (?!) because an empty string always matches, so
 an assertion that requires there not to be an empty string must always fail.
+The Perl 5.10 backtracking control verb (*FAIL) or (*F) is essentially a
+synonym for (?!).
 .
 .
 .\" HTML <a name="lookbehind"></a>
@@ -1716,8 +1738,8 @@
 however, is not supported.
 .P
 Possessive quantifiers can be used in conjunction with lookbehind assertions to
-specify efficient matching at the end of the subject string. Consider a simple
-pattern such as
+specify efficient matching of fixed-length strings at the end of subject
+strings. Consider a simple pattern such as
 .sp
   abcd$
 .sp
@@ -1781,8 +1803,8 @@
 .sp
 It is possible to cause the matching process to obey a subpattern
 conditionally or to choose between two alternative subpatterns, depending on
-the result of an assertion, or whether a previous capturing subpattern matched
-or not. The two possible forms of conditional subpattern are
+the result of an assertion, or whether a specific capturing subpattern has 
+already been matched. The two possible forms of conditional subpattern are:
 .sp
   (?(condition)yes-pattern)
   (?(condition)yes-pattern|no-pattern)
@@ -1798,12 +1820,20 @@
 .rs
 .sp
 If the text between the parentheses consists of a sequence of digits, the
-condition is true if the capturing subpattern of that number has previously
-matched. An alternative notation is to precede the digits with a plus or minus
-sign. In this case, the subpattern number is relative rather than absolute.
-The most recently opened parentheses can be referenced by (?(-1), the next most
-recent by (?(-2), and so on. In looping constructs it can also make sense to
-refer to subsequent groups with constructs such as (?(+2).
+condition is true if a capturing subpattern of that number has previously
+matched. If there is more than one capturing subpattern with the same number 
+(see the earlier 
+.\"
+.\" HTML <a href="#recursion">
+.\" </a>
+section about duplicate subpattern numbers),
+.\"
+the condition is true if any of them have been set. An alternative notation is
+to precede the digits with a plus or minus sign. In this case, the subpattern
+number is relative rather than absolute. The most recently opened parentheses
+can be referenced by (?(-1), the next most recent by (?(-2), and so on. In
+looping constructs it can also make sense to refer to subsequent groups with
+constructs such as (?(+2).
 .P
 Consider the following pattern, which contains non-significant white space to
 make it more readable (assume the PCRE_EXTENDED option) and to divide it into
@@ -1855,7 +1885,7 @@
 .sp
   (?(R3)...) or (?(R&name)...)
 .sp
-the condition is true if the most recent recursion is into the subpattern whose
+the condition is true if the most recent recursion is into a subpattern whose
 number or name is given. This condition does not check the entire recursion
 stack.
 .P
@@ -1887,11 +1917,9 @@
 The first part of the pattern is a DEFINE group inside which a another group
 named "byte" is defined. This matches an individual component of an IPv4
 address (a number less than 256). When matching takes place, this part of the
-pattern is skipped because DEFINE acts like a false condition.
-.P
-The rest of the pattern uses references to the named group to match the four
-dot-separated components of an IPv4 address, insisting on a word boundary at
-each end.
+pattern is skipped because DEFINE acts like a false condition. The rest of the
+pattern uses references to the named group to match the four dot-separated
+components of an IPv4 address, insisting on a word boundary at each end.
 .
 .SS "Assertion conditions"
 .rs
@@ -1963,23 +1991,24 @@
 This PCRE pattern solves the nested parentheses problem (assume the
 PCRE_EXTENDED option is set so that white space is ignored):
 .sp
-  \e( ( (?>[^()]+) | (?R) )* \e)
+  \e( ( [^()]++ | (?R) )* \e)
 .sp
 First it matches an opening parenthesis. Then it matches any number of
 substrings which can either be a sequence of non-parentheses, or a recursive
 match of the pattern itself (that is, a correctly parenthesized substring).
-Finally there is a closing parenthesis.
+Finally there is a closing parenthesis. Note the use of a possessive quantifier 
+to avoid backtracking into sequences of non-parentheses.
 .P
 If this were part of a larger pattern, you would not want to recurse the entire
 pattern, so instead you could use this:
 .sp
-  ( \e( ( (?>[^()]+) | (?1) )* \e) )
+  ( \e( ( [^()]++ | (?1) )* \e) )
 .sp
 We have put the pattern into parentheses, and caused the recursion to refer to
 them instead of the whole pattern.
 .P
 In a larger pattern, keeping track of parenthesis numbers can be tricky. This
-is made easier by the use of relative references. (A Perl 5.10 feature.)
+is made easier by the use of relative references (a Perl 5.10 feature).
 Instead of (?1) in the pattern above you can write (?-2) to refer to the second
 most recently opened parentheses preceding the recursion. In other words, a
 negative number counts capturing parentheses leftwards from the point at which
@@ -1998,19 +2027,19 @@
 for this is (?&name); PCRE's earlier syntax (?P>name) is also supported. We
 could rewrite the above example as follows:
 .sp
-  (?<pn> \e( ( (?>[^()]+) | (?&pn) )* \e) )
+  (?<pn> \e( ( [^()]++ | (?&pn) )* \e) )
 .sp
 If there is more than one subpattern with the same name, the earliest one is
 used.
 .P
 This particular example pattern that we have been looking at contains nested
-unlimited repeats, and so the use of atomic grouping for matching strings of
-non-parentheses is important when applying the pattern to strings that do not
-match. For example, when this pattern is applied to
+unlimited repeats, and so the use of a possessive quantifier for matching
+strings of non-parentheses is important when applying the pattern to strings
+that do not match. For example, when this pattern is applied to
 .sp
   (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
 .sp
-it yields "no match" quickly. However, if atomic grouping is not used,
+it yields "no match" quickly. However, if a possessive quantifier is not used,
 the match runs for a very long time indeed because there are so many different
 ways the + and * repeats can carve up the subject, and all have to be tested
 before failure can be reported.
@@ -2029,7 +2058,7 @@
 the value for the capturing parentheses is "ef", which is the last value taken
 on at the top level. If additional parentheses are added, giving
 .sp
-  \e( ( ( (?>[^()]+) | (?R) )* ) \e)
+  \e( ( ( [^()]++ | (?R) )* ) \e)
      ^                        ^
      ^                        ^
 .sp
@@ -2113,6 +2142,13 @@
 non-word characters. Without this, PCRE takes a great deal longer (ten times or
 more) to match typical phrases, and Perl takes so long that you think it has
 gone into a loop.
+.P
+\fBWARNING\fP: The palindrome-matching patterns above work only if the subject
+string does not start with a palindrome that is shorter than the entire string.
+For example, although "abcba" is correctly matched, if the subject is "ababa",
+PCRE finds the palindrome "aba" at the start, then fails at top level because
+the end of the string does not follow. Once again, it cannot jump back into the
+recursion to try other alternatives, so the entire match fails.
 .
 .
 .\" HTML <a name="subpatternsassubroutines"></a>
@@ -2248,8 +2284,8 @@
 .sp
 This verb causes the match to end successfully, skipping the remainder of the
 pattern. When inside a recursion, only the innermost pattern is ended
-immediately. If the (*ACCEPT) is inside capturing parentheses, the data so far
-is captured. (This feature was added to PCRE at release 8.00.) For example:
+immediately. If (*ACCEPT) is inside capturing parentheses, the data so far is
+captured. (This feature was added to PCRE at release 8.00.) For example:
 .sp
   A((?:A|B(*ACCEPT)|C)D)
 .sp
@@ -2280,7 +2316,7 @@
 .sp
 This verb causes the whole match to fail outright if the rest of the pattern
 does not match. Even if the pattern is unanchored, no further attempts to find
-a match by advancing the start point take place. Once (*COMMIT) has been
+a match by advancing the starting point take place. Once (*COMMIT) has been
 passed, \fBpcre_exec()\fP is committed to finding a match at the current
 starting point, or not at all. For example:
 .sp
@@ -2312,7 +2348,7 @@
 If the subject is "aaaac...", after the first match attempt fails (starting at
 the first character in the string), the starting point skips on to start the
 next attempt at "c". Note that a possessive quantifer does not have the same
-effect in this example; although it would suppress backtracking during the
+effect as this example; although it would suppress backtracking during the
 first match attempt, the second attempt would start at the second character
 instead of skipping on to "c".
 .sp
@@ -2334,7 +2370,8 @@
 .SH "SEE ALSO"
 .rs
 .sp
-\fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), \fBpcre\fP(3).
+\fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), 
+\fBpcresyntax\fP(3), \fBpcre\fP(3).
 .
 .
 .SH AUTHOR
@@ -2351,6 +2388,6 @@
 .rs
 .sp
 .nf
-Last updated: 22 September 2009
+Last updated: 30 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcresample.3
===================================================================
--- code/trunk/doc/pcresample.3    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/pcresample.3    2009-10-02 08:53:31 UTC (rev 456)
@@ -25,8 +25,8 @@
 an empty string. Comments in the code explain what is going on.
 .P
 If PCRE is installed in the standard include and library directories for your
-system, you should be able to compile the demonstration program using this
-command:
+operating system, you should be able to compile the demonstration program using
+this command:
 .sp
   gcc -o pcredemo pcredemo.c -lpcre
 .sp
@@ -87,6 +87,6 @@
 .rs
 .sp
 .nf
-Last updated: 01 September 2009
+Last updated: 30 September 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/perltest.txt
===================================================================
--- code/trunk/doc/perltest.txt    2009-09-26 19:12:32 UTC (rev 455)
+++ code/trunk/doc/perltest.txt    2009-10-02 08:53:31 UTC (rev 456)
@@ -1,7 +1,7 @@
 The perltest program
 --------------------


-The perltest program tests Perl's regular expressions; it has the same
+The perltest.pl script tests Perl's regular expressions; it has the same
specification as pcretest, and so can be given identical input, except that
input patterns can be followed only by Perl's lower case modifiers and /+ (as
used by pcretest), which is recognized and handled by the program.
@@ -14,20 +14,14 @@
escapes, are not used in these files. The output should be identical, apart
from the initial identifying banner.

-The perltest script can also test UTF-8 features. It works as is for Perl 5.8
-or higher. It recognizes the special modifier /8 that pcretest uses to invoke
-UTF-8 functionality. The testinput4 file can be fed to perltest to run
-compatible UTF-8 tests.
+The perltest.pl script can also test UTF-8 features. It recognizes the special
+modifier /8 that pcretest uses to invoke UTF-8 functionality. The testinput4
+file can be fed to perltest to run compatible UTF-8 tests.

-For Perl 5.6, perltest won't work unmodified for the UTF-8 tests. You need to
-uncomment the "use utf8" lines that it contains. It is best to do this on a
-copy of the script, because for non-UTF-8 tests, these lines should remain
-commented out.
+The other testinput files are not suitable for feeding to perltest.pl, since
+they make use of the special upper case modifiers and escapes that pcretest
+uses to test some features of PCRE. Some of these files also contains malformed
+regular expressions, in order to check that PCRE diagnoses them correctly.

-The other testinput files are not suitable for feeding to perltest, since they
-make use of the special upper case modifiers and escapes that pcretest uses to
-test some features of PCRE. Some of these files also contains malformed regular
-expressions, in order to check that PCRE diagnoses them correctly.
-
Philip Hazel
-September 2004
+September 2009