[Pcre-svn] [153] code/trunk: Tests and documentation updates…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [153] code/trunk: Tests and documentation updates.
Revision: 153
          http://www.exim.org/viewvc/pcre2?view=rev&revision=153
Author:   ph10
Date:     2014-11-18 18:32:12 +0000 (Tue, 18 Nov 2014)


Log Message:
-----------
Tests and documentation updates.

Modified Paths:
--------------
    code/trunk/doc/pcre2.3
    code/trunk/doc/pcre2api.3
    code/trunk/maint/ManyConfigTests
    code/trunk/maint/README


Modified: code/trunk/doc/pcre2.3
===================================================================
--- code/trunk/doc/pcre2.3    2014-11-18 18:31:39 UTC (rev 152)
+++ code/trunk/doc/pcre2.3    2014-11-18 18:32:12 UTC (rev 153)
@@ -1,4 +1,4 @@
-.TH PCRE2 3 "03 November 2014" "PCRE2 10.00"
+.TH PCRE2 3 "18 November 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH INTRODUCTION
@@ -8,9 +8,10 @@
 of functions, written in C, that implement regular expression pattern matching
 using the same syntax and semantics as Perl, with just a few differences. Some
 features that appeared in Python and the original PCRE before they appeared in
-Perl are also available using the Python syntax, there is some support for one
-or two .NET and Oniguruma syntax items, and there are options for requesting
-some minor changes that give better ECMAScript (aka JavaScript) compatibility.
+Perl are also available using the Python syntax. There is also some support for
+one or two .NET and Oniguruma syntax items, and there are options for
+requesting some minor changes that give better ECMAScript (aka JavaScript)
+compatibility.
 .P
 The source code for PCRE2 can be compiled to support 8-bit, 16-bit, or 32-bit
 code units, which means that up to three separate libraries may be installed.
@@ -18,7 +19,7 @@
 Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings
 can be interpreted either as one character per code unit, or as UTF-encoded
 Unicode, with support for Unicode general category properties. Unicode support
-is optional at build time (but is the default); however, processing strings as
+is optional at build time (but is the default). However, processing strings as
 UTF code units must be enabled explicitly at run time. The version of Unicode
 in use can be discovered by running
 .sp
@@ -140,19 +141,19 @@
   pcre2compat        discussion of Perl compatibility
   pcre2demo          a demonstration C program that uses PCRE2
   pcre2grep          description of the \fBpcre2grep\fP command (8-bit only)
-  pcre2jit           discussion of the just-in-time optimization support
+  pcre2jit           discussion of just-in-time optimization support
   pcre2limits        details of size and other limits
   pcre2matching      discussion of the two matching algorithms
   pcre2partial       details of the partial matching facility
 .\" JOIN
-  pcre2pattern       syntax and semantics of supported
-                      regular expressions
+  pcre2pattern       syntax and semantics of supported regular 
+                       expression patterns
   pcre2perform       discussion of performance issues
   pcre2posix         the POSIX-compatible C API for the 8-bit library
   pcre2sample        discussion of the pcre2demo program
   pcre2stack         discussion of stack usage
   pcre2syntax        quick syntax reference
-  pcre2test          description of the \fBpcre2test\fP testing command
+  pcre2test          description of the \fBpcre2test\fP command
   pcre2unicode       discussion of Unicode and UTF support
 .sp
 In the "man" and HTML formats, there is also a short page for each C library
@@ -176,6 +177,6 @@
 .rs
 .sp
 .nf
-Last updated: 03 November 2014
+Last updated: 18 November 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2014-11-18 18:31:39 UTC (rev 152)
+++ code/trunk/doc/pcre2api.3    2014-11-18 18:32:12 UTC (rev 153)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "11 November 2014" "PCRE2 10.00"
+.TH PCRE2API 3 "18 November 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -384,12 +384,9 @@
 .P
 Each of the first three conventions is used by at least one operating system as
 its standard newline sequence. When PCRE2 is built, a default can be specified.
-The default default is LF, which is the Unix standard. When PCRE2 is run, the
-default can be overridden, either when a pattern is compiled, or when it is
-matched.
-.P
-The newline convention can be changed when calling \fBpcre2_compile()\fP, or it
-can be specified by special text at the start of the pattern itself; this
+The default default is LF, which is the Unix standard. However, the newline
+convention can be changed by an application when calling \fBpcre2_compile()\fP,
+or it can be specified by special text at the start of the pattern itself; this
 overrides any other settings. See the
 .\" HREF
 \fBpcre2pattern\fP
@@ -409,8 +406,8 @@
 below.
 .P
 The choice of newline convention does not affect the interpretation of
-the \en or \er escape sequences, nor does it affect what \eR matches, which has
-its own separate control.
+the \en or \er escape sequences, nor does it affect what \eR matches; this has
+its own separate convention.
 .
 .
 .SH MULTITHREADING
@@ -423,7 +420,7 @@
 time ensuring that multithreaded applications can use it.
 .P
 There are several different blocks of data that are used to pass information
-between the application and the PCRE libraries.
+between the application and the PCRE2 libraries.
 .P
 (1) A pointer to the compiled form of a pattern is returned to the user when
 \fBpcre2_compile()\fP is successful. The data in the compiled pattern is fixed,
@@ -529,11 +526,11 @@
 A compile context is required if you want to change the default values of any
 of the following compile-time parameters:
 .sp
-  What \eR matches (Unicode newlines or CR, LF, CRLF only);
-  PCRE2's character tables;
-  The newline character sequence;
-  The compile time nested parentheses limit;
-  An external function for stack checking.
+  What \eR matches (Unicode newlines or CR, LF, CRLF only)
+  PCRE2's character tables
+  The newline character sequence
+  The compile time nested parentheses limit
+  An external function for stack checking
 .sp
 A compile context is also required if you are using custom memory management.
 If none of these apply, just pass NULL as the context argument of
@@ -562,9 +559,8 @@
 .sp
 The value must be PCRE2_BSR_ANYCRLF, to specify that \eR matches only CR, LF,
 or CRLF, or PCRE2_BSR_UNICODE, to specify that \eR matches any Unicode line
-ending sequence. The value of this parameter does not affect what is compiled;
-it is just saved with the compiled pattern. The value is used by the JIT
-compiler and by the two interpreted matching functions, \fIpcre2_match()\fP and
+ending sequence. The value is used by the JIT compiler and by the two
+interpreted matching functions, \fIpcre2_match()\fP and
 \fIpcre2_dfa_match()\fP.
 .sp
 .nf
@@ -678,12 +674,12 @@
 in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
 which ignores it.
 .P
-When \fBpcre2_match()\fP is called with a pattern that was successfully studied
-with \fBpcre2_jit_compile()\fP, the way that the matching is executed is
-entirely different. However, there is still the possibility of runaway matching
-that goes on for a very long time, and so the \fImatch_limit\fP value is also
-used in this case (but in a different way) to limit how long the matching can
-continue.
+When \fBpcre2_match()\fP is called with a pattern that was successfully 
+processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed
+is entirely different. However, there is still the possibility of runaway
+matching that goes on for a very long time, and so the \fImatch_limit\fP value
+is also used in this case (but in a different way) to limit how long the
+matching can continue.
 .P
 The default value for the limit can be set when PCRE2 is built; the default
 default is 10 million, which handles all but the most extreme cases. If the
@@ -744,15 +740,16 @@
 .\" HREF
 \fBpcre2build\fP
 .\"
-documentation for details of how to build PCRE2. Using the heap for recursion
-is a non-standard way of building PCRE2, for use in environments that have
-limited stacks. Because of the greater use of memory management,
-\fBpcre2_match()\fP runs more slowly. Functions that are different to the
-general custom memory functions are provided so that special-purpose external
-code can be used for this case, because the memory blocks are all the same
-size. The blocks are retained by \fBpcre2_match()\fP until it is about to exit
-so that they can be re-used when possible during the match. In the absence of
-these functions, the normal custom memory management functions are used, if
+documentation for details of how to build PCRE2. 
+.P
+Using the heap for recursion is a non-standard way of building PCRE2, for use
+in environments that have limited stacks. Because of the greater use of memory
+management, \fBpcre2_match()\fP runs more slowly. Functions that are different
+to the general custom memory functions are provided so that special-purpose
+external code can be used for this case, because the memory blocks are all the
+same size. The blocks are retained by \fBpcre2_match()\fP until it is about to
+exit so that they can be re-used when possible during the match. In the absence
+of these functions, the normal custom memory management functions are used, if
 supplied, otherwise the system functions.
 .
 .
@@ -784,9 +781,10 @@
   PCRE2_CONFIG_BSR
 .sp
 The output is an integer whose value indicates what character sequences the \eR
-escape sequence matches by default. A value of 0 means that \eR matches any
-Unicode line ending sequence; a value of 1 means that \eR matches only CR, LF,
-or CRLF. The default can be overridden when a pattern is compiled or matched.
+escape sequence matches by default. A value of PCRE2_BSR_UNICODE means that \eR
+matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means
+that \eR matches only CR, LF, or CRLF. The default can be overridden when a
+pattern is compiled.
 .sp
   PCRE2_CONFIG_JIT
 .sp
@@ -796,7 +794,7 @@
   PCRE2_CONFIG_JITTARGET
 .sp
 The \fIwhere\fP argument should point to a buffer that is at least 48 code
-units long. (The exact length needed can be found by calling
+units long. (The exact length required can be found by calling
 \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with a
 string that contains the name of the architecture for which the JIT compiler is
 configured, for example "x86 32bit (little endian + unaligned)". If JIT support
@@ -829,11 +827,11 @@
 The output is an integer whose value specifies the default character sequence
 that is recognized as meaning "newline". The values are:
 .sp
-  1  Carriage return (CR)
-  2  Linefeed (LF)
-  3  Carriage return, linefeed (CRLF)
-  4  Any Unicode line ending
-  5  Any of CR, LF, or CRLF
+  PCRE2_NEWLINE_CR       Carriage return (CR)
+  PCRE2_NEWLINE_LF       Linefeed (LF)
+  PCRE2_NEWLINE_CRLF     Carriage return, linefeed (CRLF)
+  PCRE2_NEWLINE_ANY      Any Unicode line ending
+  PCRE2_NEWLINE_ANYCRLF  Any of CR, LF, or CRLF
 .sp
 The default should normally correspond to the standard sequence for your
 operating system.
@@ -865,7 +863,7 @@
   PCRE2_CONFIG_UNICODE_VERSION
 .sp
 The \fIwhere\fP argument should point to a buffer that is at least 24 code
-units long. (The exact length needed can be found by calling
+units long. (The exact length required can be found by calling
 \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) If PCRE2 has been compiled
 without Unicode support, the buffer is filled with the text "Unicode not
 supported". Otherwise, the Unicode version string (for example, "7.0.0") is
@@ -880,7 +878,7 @@
   PCRE2_CONFIG_VERSION
 .sp
 The \fIwhere\fP argument should point to a buffer that is at least 12 code
-units long. (The exact length needed can be found by calling
+units long. (The exact length required can be found by calling
 \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
 the PCRE2 version string, zero-terminated. The number of code units used is
 returned. This is the length of the string plus one unit for the terminating
@@ -899,16 +897,16 @@
 .B pcre2_code_free(pcre2_code *\fIcode\fP);
 .fi
 .P
-This function compiles a pattern, defined by a pointer to a string of code
-units and a length, into an internal form. If the pattern is zero-terminated,
-the length should be specified as PCRE2_ZERO_TERMINATED. The function returns a
-pointer to a block of memory that contains the compiled pattern and related
-data. The caller must free the memory by calling \fBpcre2_code_free()\fP when
-it is no longer needed.
+The \fBpcre2_compile()\fP function compiles a pattern into an internal form.
+The pattern is defined by a pointer to a string of code units and a length, If
+the pattern is zero-terminated, the length can be specified as
+PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that
+contains the compiled pattern and related data. The caller must free the memory
+by calling \fBpcre2_code_free()\fP when it is no longer needed.
 .P
-If the compile context argument \fIccontext\fP is NULL, the memory is obtained
-by calling \fBmalloc()\fP. Otherwise, it is obtained from the same memory
-function that was used for the compile context.
+If the compile context argument \fIccontext\fP is NULL, memory for the compiled 
+pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from
+the same memory function that was used for the compile context.
 .P
 The \fIoptions\fP argument contains various bit settings that affect the
 compilation. It should be zero if no options are required. The available
@@ -1235,7 +1233,7 @@
 \fBpcre2pattern\fP
 .\"
 page. If you set PCRE2_UCP, matching one of the items it affects takes much
-longer. The option is available only if PCRE2 has been compiled with UTF
+longer. The option is available only if PCRE2 has been compiled with Unicode
 support.
 .sp
   PCRE2_UNGREEDY
@@ -1248,9 +1246,10 @@
 .sp
 This option causes PCRE2 to regard both the pattern and the subject strings
 that are subsequently processed as strings of UTF characters instead of
-single-code-unit strings. However, it is available only when PCRE2 is built to
-include UTF support. If not, the use of this option provokes an error. Details
-of how this option changes the behaviour of PCRE2 are given in the
+single-code-unit strings. It is available when PCRE2 is built to include
+Unicode support (which is the default). If Unicode support is not available,
+the use of this option provokes an error. Details of how this option changes
+the behaviour of PCRE2 are given in the
 .\" HREF
 \fBpcre2unicode\fP
 .\"
@@ -1314,13 +1313,12 @@
 .sp
 PCRE2 handles caseless matching, and determines whether characters are letters,
 digits, or whatever, by reference to a set of tables, indexed by character code
-point. When running in UTF-8 mode, or using the 16-bit or 32-bit libraries,
-this applies only to characters with code points less than 256. By default,
-higher-valued code points never match escapes such as \ew or \ed. However, if
-PCRE2 is built with UTF support, all characters can be tested with \ep and \eP,
-or, alternatively, the PCRE2_UCP option can be set when a pattern is compiled;
-this causes \ew and friends to use Unicode property support instead of the
-built-in tables.
+point. This applies only to characters whose code points are less than 256. By
+default, higher-valued code points never match escapes such as \ew or \ed.
+However, if PCRE2 is built with UTF support, all characters can be tested with
+\ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a pattern
+is compiled; this causes \ew and friends to use Unicode property support
+instead of the built-in tables.
 .P
 The use of locales with Unicode is discouraged. If you are handling characters
 with code points greater than 128, you should either use Unicode support, or
@@ -1433,9 +1431,9 @@
   PCRE2_INFO_BSR
 .sp
 The output is a uint32_t whose value indicates what character sequences the \eR
-escape sequence matches by default. A value of 0 means that \eR matches any
-Unicode line ending sequence; a value of 1 means that \eR matches only CR, LF,
-or CRLF. The default can be overridden when a pattern is matched.
+escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR matches
+any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means that \eR
+matches only CR, LF, or CRLF.
 .sp
   PCRE2_INFO_CAPTURECOUNT
 .sp
@@ -1623,18 +1621,17 @@
 .sp
   PCRE2_INFO_NEWLINE
 .sp
-The output is a \fBuint32_t\fP whose value specifies the default character
-sequence that will be recognized as meaning "newline" while matching. The
-values are:
+The output is a \fBuint32_t\fP with one of the following values: 
 .sp
-  1  Carriage return (CR)
-  2  Linefeed (LF)
-  3  Carriage return, linefeed (CRLF)
-  4  Any Unicode line ending
-  5  Any of CR, LF, or CRLF
+  PCRE2_NEWLINE_CR       Carriage return (CR)
+  PCRE2_NEWLINE_LF       Linefeed (LF)
+  PCRE2_NEWLINE_CRLF     Carriage return, linefeed (CRLF)
+  PCRE2_NEWLINE_ANY      Any Unicode line ending
+  PCRE2_NEWLINE_ANYCRLF  Any of CR, LF, or CRLF
+.sp  
+This specifies the default character sequence that will be recognized as
+meaning "newline" while matching.
 .sp
-The default can be overridden when a pattern is matched.
-.sp
   PCRE2_INFO_RECURSIONLIMIT
 .sp
 If the pattern set a recursion limit by including an item of the form
@@ -1671,30 +1668,32 @@
 data block, which is an opaque structure that is accessed by function calls. In
 particular, the match data block contains a vector of offsets into the subject
 string that define the matched part of the subject and any substrings that were
-capured. This is know as the \fIovector\fP.
+captured. This is know as the \fIovector\fP.
 .P
-Before calling \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP you must create a
-match data block by calling one of the creation functions above. For
-\fBpcre2_match_data_create()\fP, the first argument is the number of pairs of
-offsets in the \fIovector\fP. One pair of offsets is required to identify the
-string that matched the whole pattern, with another pair for each captured
-substring. For example, a value of 4 creates enough space to record the matched
-portion of the subject plus three captured substrings. A minimum of at least 1
-pair is imposed by \fBpcre2_match_data_create()\fP, so it is always possible to
-return the overall matched string.
+Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or 
+\fBpcre2_jit_match()\fP you must create a match data block by calling one of
+the creation functions above. For \fBpcre2_match_data_create()\fP, the first
+argument is the number of pairs of offsets in the \fIovector\fP. One pair of
+offsets is required to identify the string that matched the whole pattern, with
+another pair for each captured substring. For example, a value of 4 creates
+enough space to record the matched portion of the subject plus three captured
+substrings. A minimum of at least 1 pair is imposed by
+\fBpcre2_match_data_create()\fP, so it is always possible to return the overall
+matched string.
 .P
 For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a
 pointer to a compiled pattern. In this case the ovector is created to be
 exactly the right size to hold all the substrings a pattern might capture.
 .P
-The second argument of both these functions ia a pointer to a general context,
+The second argument of both these functions is a pointer to a general context,
 which can specify custom memory management for obtaining the memory for the
 match data block. If you are not using custom memory management, pass NULL.
 .P
 A match data block can be used many times, with the same or different compiled
 patterns. When it is no longer needed, it should be freed by calling
-\fBpcre2_match_data_free()\fP. How to extract information from a match data
-block after a match operation is described in the sections on
+\fBpcre2_match_data_free()\fP. You can extract information from a match data
+block after a match operation has finished, using functions that are described
+in the sections on
 .\" HTML <a href="#matchedstrings">
 .\" </a>
 matched strings
@@ -1819,12 +1818,10 @@
 PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
 PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
 .P
-If the pattern was successfully processed by the just-in-time (JIT) compiler,
-the only supported options for matching using the JIT code are PCRE2_NOTBOL,
-PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
-PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. If an unsupported option is used,
-JIT matching is disabled and the normal interpretive code in
-\fBpcre2_match()\fP is run.
+Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
+compiler. If it is set, JIT matching is disabled and the normal interpretive
+code in \fBpcre2_match()\fP is run. The remaining options are supported for JIT 
+matching.
 .sp
   PCRE2_ANCHORED
 .sp
@@ -2704,6 +2701,6 @@
 .rs
 .sp
 .nf
-Last updated: 11 November 2014
+Last updated: 18 November 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi


Modified: code/trunk/maint/ManyConfigTests
===================================================================
--- code/trunk/maint/ManyConfigTests    2014-11-18 18:31:39 UTC (rev 152)
+++ code/trunk/maint/ManyConfigTests    2014-11-18 18:32:12 UTC (rev 153)
@@ -58,8 +58,8 @@


# If the compiler is gcc, add a lot of warning switches.

-cc --version >zzz 2>/dev/null
-if [ $? -eq 0 ] && grep GCC zzz >/dev/null; then
+cc --version >/tmp/pcre2ccversion 2>/dev/null
+if [ $? -eq 0 ] && grep GCC /tmp/pcre2ccversion >/dev/null; then
ISGCC=1
CFLAGS="$CFLAGS -Wall"
CFLAGS="$CFLAGS -Wno-overlength-strings"
@@ -77,8 +77,8 @@
CFLAGS="$CFLAGS -Wmissing-prototypes"
CFLAGS="$CFLAGS -Wstrict-prototypes"
fi
+rm -f /tmp/pcre2ccversion

-
# This function runs a single test with the set of configuration options that
# are in $opts. The source directory must be set in srcdir. The function must
# be defined as "runtest()" not "function runtest()" in order to run on
@@ -129,8 +129,6 @@

./pcre2test -C jit >/dev/null
jit=$?
- ./pcre2test -C unicode >/dev/null
- utf=$?
./pcre2test -C pcre2-8 >/dev/null
pcre2_8=$?

@@ -164,7 +162,7 @@
     echo "Skipping pcre2grep tests: newline is $nl"
   fi


-  if [ "$jit" -gt 0 -a $utf -gt 0 ]; then
+  if [ "$jit" -gt 0 ]; then
     echo "Running JIT regression tests $withvalgrind"
     $cvalgrind $srcdir/pcre2_jit_test >teststdout 2>teststderr
     if [ $? -ne 0 -o -s teststderr ]; then
@@ -175,7 +173,7 @@
       exit 1
     fi
   else
-    echo "Skipping JIT regression tests: JIT or UTF not enabled"
+    echo "Skipping JIT regression tests: JIT is not enabled"
   fi
   }



Modified: code/trunk/maint/README
===================================================================
--- code/trunk/maint/README    2014-11-18 18:31:39 UTC (rev 152)
+++ code/trunk/maint/README    2014-11-18 18:32:12 UTC (rev 153)
@@ -65,7 +65,7 @@


When there is a new release of Unicode, the files in Unicode.tables must be
refreshed from the web site. If the new version of Unicode adds new character
-scripts, the source file pacr2_ucp.h and both the MultiStage2.py and the
+scripts, the source file pcre2_ucp.h and both the MultiStage2.py and the
GenerateUtt.py scripts must be edited to add the new names. Then MultiStage2.py
can be run to generate a new version of pcre2_ucd.c, and GenerateUtt.py can be
run to generate the tricky tables for inclusion in pcre2_tables.c.
@@ -73,7 +73,7 @@
If MultiStage2.py gives the error "ValueError: list.index(x): x not in list",
the cause is usually a missing (or misspelt) name in the list of scripts. I
couldn't find a straightforward list of scripts on the Unicode site, but
-there's a useful Wikipedia page that list them, and notes the Unicode version
+there's a useful Wikipedia page that lists them, and notes the Unicode version
in which they were introduced:

http://en.wikipedia.org/wiki/Unicode_scripts#Table_of_Unicode_scripts
@@ -130,7 +130,7 @@
systems, using different compilers as well. For example, on Solaris it is
helpful to test using Sun's cc compiler as a change from gcc. Adding
-xarch=v9 to the cc options does a 64-bit test, but it also needs -S 64 for
- pcretest to increase the stack size for test 2. Since I retired I can no
+ pcre2test to increase the stack size for test 2. Since I retired I can no
longer do this, but instead I rely on putting out release candidates for
folks on the pcre-dev list to test.

@@ -194,7 +194,7 @@
copy:

   svn copy svn://vcs.exim.org/pcre2/code/trunk \
-           svn://vcs.exim.org/pcre2/code/tags/pcre-8.xx
+           svn://vcs.exim.org/pcre2/code/tags/pcre-10.xx


When the new release is out, don't forget to tell webmaster@??? and the
mailing list. Also, update the list of version numbers in Bugzilla (edit
@@ -206,8 +206,7 @@

This section records a list of ideas so that they do not get forgotten. They
vary enormously in their usefulness and potential for implementation. Some are
-very sensible; some are rather wacky. Some have been on this list for years;
-others are relatively new.
+very sensible; some are rather wacky. Some have been on this list for years.

. Optimization

@@ -226,42 +225,38 @@
     over the existing "required code unit" feature that just remembers one code
     unit.


- * Remember an initial string rather than just 1 code unit?
+ * Remember an initial string rather than just 1 code unit.

   * A required code unit from alternatives - not just the last unit, but an
     earlier one if common to all alternatives.


- o Friedl contains other ideas.
+ * Friedl contains other ideas.

   * The code does not set initial code unit flags for Unicode property types
     such as \p; I don't know how much benefit there would be for, for example,
     setting the bits for 0-9 and all values >= xC0 (in 8-bit mode) when a
     pattern starts with \p{N}.


- * There is scope for more "auto-possessifying" in connection with \p and \P.
-
. If Perl gets to a consistent state over the settings of capturing sub-
patterns inside repeats, see if we can match it. One example of the
- difference is the matching of /(main(O)?)+/ against mainOmain, where PCRE
- leaves $2 set. In Perl, it's unset. Changing this in PCRE will be very hard
+ difference is the matching of /(main(O)?)+/ against mainOmain, where PCRE2
+ leaves $2 set. In Perl, it's unset. Changing this in PCRE2 will be very hard
because I think it needs much more state to be remembered.

. Perl 6 will be a revolution. Is it a revolution too far for PCRE?

-. Line endings:
+. An option to use NUL as a line terminator in subject strings. This could be
+ done relatively easily. If it is done, a suitable option for pcre2grep is
+ also required.

-  * Option to use NUL as a line terminator in subject strings. This could now
-    be done relatively easily since the extension to support LF, CR, and CRLF.
-    If it is done, a suitable option for pcre2grep is also required.
-
 . Catch SIGSEGV for stack overflows?


. A feature to suspend a match via a callout was once requested.

-. Option to convert results into character offsets and character lengths.
+. An option to convert results into character offsets and character lengths.

-. Option for pcre2grep to scan only the start of a file. I am not keen - this
- is the job of "head".
+. An option for pcre2grep to scan only the start of a file. I am not keen -
+ this is the job of "head".

. A (non-Unix) user wanted pcregrep options to (a) list a file name just once,
preceded by a blank line, instead of adding it to every matched line, and (b)
@@ -274,11 +269,6 @@
to switch this dynamically. It would have to be specified when PCRE2 was
compiled. PCRE2 would then call a function every time it wanted a character.

-. Wild thought: the ability to compile from PCRE2's internal code to a real
- FSM and a very fast (third) matcher to process the result. There would be
- even more restrictions than for pcre2_dfa_exec(), however. This is not easy.
- This is probably obsolete now that we have the JIT support.
-
. pcre2grep: add -rs for a sorted recurse? Having to store file names and sort
them will of course slow it down.

@@ -296,10 +286,10 @@
pattern.

. Pcre2grep: an option to specify the output line separator, either as a string
- or select from a fixed list. This is not dead easy, because at the moment it
- outputs whatever is in the input file.
+ or select from a fixed list. This is not straightforward, because at the
+ moment it outputs whatever is in the input file.

-. Improve the code for duplicate checking in pcre_dfa_exec(). An incomplete,
+. Improve the code for duplicate checking in pcre_dfa_match(). An incomplete,
non-thread-safe patch showed that this can help performance for patterns
where there are many alternatives. However, a simple thread-safe
implementation that I tried made things worse in many simple cases, so this
@@ -308,8 +298,7 @@
. PCRE2 cannot at present distinguish between subpatterns with different names,
but the same number (created by the use of ?|). In order to do so, a way of
remembering *which* subpattern numbered n matched is needed. Bugzilla #760.
- Now that (*MARK) has been implemented, it can perhaps be used as a way round
- this problem.
+ (*MARK) can perhaps be used as a way round this problem.

. Instead of having #ifdef HAVE_CONFIG_H in each module, put #include
"something" and the the #ifdef appears only in one place, in "something".
@@ -317,4 +306,4 @@
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 25 October 2014
+Last updated: 18 November 2014