Revision: 576
http://vcs.pcre.org/viewvc?view=rev&revision=576
Author: ph10
Date: 2010-11-21 18:45:10 +0000 (Sun, 21 Nov 2010)
Log Message:
-----------
Added support for (*NO_START_OPT)
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcreapi.3
code/trunk/doc/pcrepattern.3
code/trunk/doc/pcresyntax.3
code/trunk/doc/pcretest.1
code/trunk/pcre.h.in
code/trunk/pcre_compile.c
code/trunk/pcre_dfa_exec.c
code/trunk/pcre_exec.c
code/trunk/pcre_internal.h
code/trunk/pcretest.c
code/trunk/testdata/testinput2
code/trunk/testdata/testinput7
code/trunk/testdata/testoutput2
code/trunk/testdata/testoutput7
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/ChangeLog 2010-11-21 18:45:10 UTC (rev 576)
@@ -113,6 +113,13 @@
compile-time error is now given if \c is not followed by an ASCII
character, that is, a byte less than 128. (In EBCDIC mode, the code is
different, and any byte value is allowed.)
+
+20. Recognize (*NO_START_OPT) at the start of a pattern to set the PCRE_NO_
+ START_OPTIMIZE option, which is now allowed at compile time - but just
+ passed through to pcre_exec() or pcre_dfa_exec(). This makes it available
+ to pcregrep and other applications that have no direct access to PCRE
+ options. The new /Y option in pcretest sets this option when calling
+ pcre_compile().
Version 8.10 25-Jun-2010
Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/doc/pcreapi.3 2010-11-21 18:45:10 UTC (rev 576)
@@ -428,8 +428,9 @@
documentation). For those options that can be different in different parts of
the pattern, the contents of the \fIoptions\fP argument specifies their
settings at the start of compilation and execution. The PCRE_ANCHORED,
-PCRE_BSR_\fIxxx\fP, and PCRE_NEWLINE_\fIxxx\fP options can be set at the time
-of matching as well as at compile time.
+PCRE_BSR_\fIxxx\fP, PCRE_NEWLINE_\fIxxx\fP, PCRE_NO_UTF8_CHECK, and
+PCRE_NO_START_OPT options can be set at the time of matching as well as at
+compile time.
.P
If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns
@@ -658,6 +659,17 @@
they acquire numbers in the usual way). There is no equivalent of this option
in Perl.
.sp
+ NO_START_OPTIMIZE
+.sp
+This is an option that acts at matching time; that is, it is really an option
+for \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. If it is set at compile time,
+it is remembered with the compiled pattern and assumed at matching time. For
+details see the discussion of PCRE_NO_START_OPTIMIZE
+.\" HTML <a href="#execoptions">
+.\" </a>
+below.
+.\"
+.sp
PCRE_UCP
.sp
This option changes the way PCRE processes \eB, \eb, \eD, \ed, \eS, \es, \eW,
@@ -1487,7 +1499,10 @@
The PCRE_NO_START_OPTIMIZE option disables the start-up optimizations, possibly
causing performance to suffer, but ensuring that in cases where the result is
"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
-are considered at every possible starting position in the subject string.
+are considered at every possible starting position in the subject string. If
+PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
+time.
+.P
Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
Consider the pattern
.sp
@@ -2252,6 +2267,6 @@
.rs
.sp
.nf
-Last updated: 13 November 2010
+Last updated: 21 November 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/doc/pcrepattern.3 2010-11-21 18:45:10 UTC (rev 576)
@@ -52,6 +52,11 @@
instead of recognizing only characters with codes less than 128 via a lookup
table.
.P
+If a pattern starts with (*NO_START_OPT), it has the same effect as setting the
+PCRE_NO_START_OPTIMIZE option either at compile or matching time. There are
+also some more of these special sequences that are concerned with the handling
+of newlines; they are described below.
+.P
The remainder of this document discusses the patterns that are supported by
PCRE when its main matching function, \fBpcre_exec()\fP, is used.
From release 6.0, PCRE offers a second matching function,
Modified: code/trunk/doc/pcresyntax.3
===================================================================
--- code/trunk/doc/pcresyntax.3 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/doc/pcresyntax.3 2010-11-21 18:45:10 UTC (rev 576)
@@ -24,7 +24,7 @@
.rs
.sp
\ea alarm, that is, the BEL character (hex 07)
- \ecx "control-x", where x is any character
+ \ecx "control-x", where x is any ASCII character
\ee escape (hex 1B)
\ef formfeed (hex 0C)
\en newline (hex 0A)
@@ -336,6 +336,7 @@
The following are recognized only at the start of a pattern or after one of the
newline-setting options with similar syntax:
.sp
+ (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
(*UTF8) set UTF-8 mode (PCRE_UTF8)
(*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
.
@@ -473,6 +474,6 @@
.rs
.sp
.nf
-Last updated: 12 May 2010
+Last updated: 21 November 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/doc/pcretest.1 2010-11-21 18:45:10 UTC (rev 576)
@@ -179,6 +179,7 @@
\fB/U\fP PCRE_UNGREEDY
\fB/W\fP PCRE_UCP
\fB/X\fP PCRE_EXTRA
+ \fB/Y\fP PCRE_NO_START_OPTIMIZE
\fB/<JS>\fP PCRE_JAVASCRIPT_COMPAT
\fB/<cr>\fP PCRE_NEWLINE_CR
\fB/<lf>\fP PCRE_NEWLINE_LF
@@ -778,6 +779,6 @@
.rs
.sp
.nf
-Last updated: 07 November 2010
+Last updated: 21 November 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
Modified: code/trunk/pcre.h.in
===================================================================
--- code/trunk/pcre.h.in 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre.h.in 2010-11-21 18:45:10 UTC (rev 576)
@@ -129,7 +129,7 @@
#define PCRE_BSR_ANYCRLF 0x00800000 /* Compile, exec, DFA exec */
#define PCRE_BSR_UNICODE 0x01000000 /* Compile, exec, DFA exec */
#define PCRE_JAVASCRIPT_COMPAT 0x02000000 /* Compile */
-#define PCRE_NO_START_OPTIMIZE 0x04000000 /* Exec, DFA exec */
+#define PCRE_NO_START_OPTIMIZE 0x04000000 /* Compile, exec, DFA exec */
#define PCRE_NO_START_OPTIMISE 0x04000000 /* Synonym */
#define PCRE_PARTIAL_HARD 0x08000000 /* Exec, DFA exec */
#define PCRE_NOTEMPTY_ATSTART 0x10000000 /* Exec, DFA exec */
Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre_compile.c 2010-11-21 18:45:10 UTC (rev 576)
@@ -6859,6 +6859,8 @@
{ skipatstart += 7; options |= PCRE_UTF8; continue; }
else if (strncmp((char *)(ptr+skipatstart+2), STRING_UCP_RIGHTPAR, 4) == 0)
{ skipatstart += 6; options |= PCRE_UCP; continue; }
+ else if (strncmp((char *)(ptr+skipatstart+2), STRING_NO_START_OPT_RIGHTPAR, 13) == 0)
+ { skipatstart += 15; options |= PCRE_NO_START_OPTIMIZE; continue; }
if (strncmp((char *)(ptr+skipatstart+2), STRING_CR_RIGHTPAR, 3) == 0)
{ skipatstart += 5; newnl = PCRE_NEWLINE_CR; }
Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre_dfa_exec.c 2010-11-21 18:45:10 UTC (rev 576)
@@ -3057,9 +3057,11 @@
/* There are some optimizations that avoid running the match if a known
starting point is not found. However, there is an option that disables
- these, for testing and for ensuring that all callouts do actually occur. */
+ these, for testing and for ensuring that all callouts do actually occur.
+ The option can be set in the regex by (*NO_START_OPT) or passed in
+ match-time options. */
- if ((options & PCRE_NO_START_OPTIMIZE) == 0)
+ if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0)
{
/* Advance to a known first byte. */
Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre_exec.c 2010-11-21 18:45:10 UTC (rev 576)
@@ -5936,9 +5936,10 @@
/* There are some optimizations that avoid running the match if a known
starting point is not found, or if a known later character is not present.
However, there is an option that disables these, for testing and for ensuring
- that all callouts do actually occur. */
+ that all callouts do actually occur. The option can be set in the regex by
+ (*NO_START_OPT) or passed in match-time options. */
- if ((options & PCRE_NO_START_OPTIMIZE) == 0)
+ if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0)
{
/* Advance to a unique first byte if there is one. */
Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre_internal.h 2010-11-21 18:45:10 UTC (rev 576)
@@ -615,7 +615,7 @@
PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \
PCRE_NO_AUTO_CAPTURE|PCRE_NO_UTF8_CHECK|PCRE_AUTO_CALLOUT|PCRE_FIRSTLINE| \
PCRE_DUPNAMES|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
- PCRE_JAVASCRIPT_COMPAT|PCRE_UCP)
+ PCRE_JAVASCRIPT_COMPAT|PCRE_UCP|PCRE_NO_START_OPTIMIZE)
#define PUBLIC_EXEC_OPTIONS \
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
@@ -932,15 +932,16 @@
#define STRING_DEFINE "DEFINE"
-#define STRING_CR_RIGHTPAR "CR)"
-#define STRING_LF_RIGHTPAR "LF)"
-#define STRING_CRLF_RIGHTPAR "CRLF)"
-#define STRING_ANY_RIGHTPAR "ANY)"
-#define STRING_ANYCRLF_RIGHTPAR "ANYCRLF)"
-#define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)"
-#define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)"
-#define STRING_UTF8_RIGHTPAR "UTF8)"
-#define STRING_UCP_RIGHTPAR "UCP)"
+#define STRING_CR_RIGHTPAR "CR)"
+#define STRING_LF_RIGHTPAR "LF)"
+#define STRING_CRLF_RIGHTPAR "CRLF)"
+#define STRING_ANY_RIGHTPAR "ANY)"
+#define STRING_ANYCRLF_RIGHTPAR "ANYCRLF)"
+#define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)"
+#define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)"
+#define STRING_UTF8_RIGHTPAR "UTF8)"
+#define STRING_UCP_RIGHTPAR "UCP)"
+#define STRING_NO_START_OPT_RIGHTPAR "NO_START_OPT)"
#else /* SUPPORT_UTF8 */
@@ -1186,15 +1187,16 @@
#define STRING_DEFINE STR_D STR_E STR_F STR_I STR_N STR_E
-#define STRING_CR_RIGHTPAR STR_C STR_R STR_RIGHT_PARENTHESIS
-#define STRING_LF_RIGHTPAR STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_CRLF_RIGHTPAR STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_ANY_RIGHTPAR STR_A STR_N STR_Y STR_RIGHT_PARENTHESIS
-#define STRING_ANYCRLF_RIGHTPAR STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS
-#define STRING_UTF8_RIGHTPAR STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS
-#define STRING_UCP_RIGHTPAR STR_U STR_C STR_P STR_RIGHT_PARENTHESIS
+#define STRING_CR_RIGHTPAR STR_C STR_R STR_RIGHT_PARENTHESIS
+#define STRING_LF_RIGHTPAR STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_CRLF_RIGHTPAR STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_ANY_RIGHTPAR STR_A STR_N STR_Y STR_RIGHT_PARENTHESIS
+#define STRING_ANYCRLF_RIGHTPAR STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS
+#define STRING_UTF8_RIGHTPAR STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS
+#define STRING_UCP_RIGHTPAR STR_U STR_C STR_P STR_RIGHT_PARENTHESIS
+#define STRING_NO_START_OPT_RIGHTPAR STR_N STR_O STR_UNDERSCORE STR_S STR_T STR_A STR_R STR_T STR_UNDERSCORE STR_O STR_P STR_T STR_RIGHT_PARENTHESIS
#endif /* SUPPORT_UTF8 */
Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcretest.c 2010-11-21 18:45:10 UTC (rev 576)
@@ -1588,6 +1588,7 @@
case 'U': options |= PCRE_UNGREEDY; break;
case 'W': options |= PCRE_UCP; break;
case 'X': options |= PCRE_EXTRA; break;
+ case 'Y': options |= PCRE_NO_START_OPTIMISE; break;
case 'Z': debug_lengths = 0; break;
case '8': options |= PCRE_UTF8; use_utf8 = 1; break;
case '?': options |= PCRE_NO_UTF8_CHECK; break;
@@ -1924,7 +1925,7 @@
if (do_flip) all_options = byteflip(all_options, sizeof(all_options));
if (get_options == 0) fprintf(outfile, "No options\n");
- else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+ else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
((get_options & PCRE_ANCHORED) != 0)? " anchored" : "",
((get_options & PCRE_CASELESS) != 0)? " caseless" : "",
((get_options & PCRE_EXTENDED) != 0)? " extended" : "",
@@ -1940,6 +1941,7 @@
((get_options & PCRE_UTF8) != 0)? " utf8" : "",
((get_options & PCRE_UCP) != 0)? " ucp" : "",
((get_options & PCRE_NO_UTF8_CHECK) != 0)? " no_utf8_check" : "",
+ ((get_options & PCRE_NO_START_OPTIMIZE) != 0)? " no_start_optimize" : "",
((get_options & PCRE_DUPNAMES) != 0)? " dupnames" : "");
if (jchanged) fprintf(outfile, "Duplicate name status changes\n");
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/testdata/testinput2 2010-11-21 18:45:10 UTC (rev 576)
@@ -2584,7 +2584,13 @@
abc\Y
abcxypqr
abcxypqr\Y
+
+/(*NO_START_OPT)xyz/C
+ abcxyz
+/xyz/CY
+ abcxyz
+
/^"((?(?=[a])[^"])|b)*"$/C
"ab"
Modified: code/trunk/testdata/testinput7
===================================================================
--- code/trunk/testdata/testinput7 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/testdata/testinput7 2010-11-21 18:45:10 UTC (rev 576)
@@ -4411,6 +4411,9 @@
abc\Y
abcxypqr
abcxypqr\Y
+
+/(*NO_START_OPT)xyz/C
+ abcxyz
/(?C)ab/
ab
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/testdata/testoutput2 2010-11-21 18:45:10 UTC (rev 576)
@@ -9294,7 +9294,31 @@
+0 ^ x
+0 ^ x
No match
+
+/(*NO_START_OPT)xyz/C
+ abcxyz
+--->abcxyz
++15 ^ x
++15 ^ x
++15 ^ x
++15 ^ x
++16 ^^ y
++17 ^ ^ z
++18 ^ ^
+ 0: xyz
+/xyz/CY
+ abcxyz
+--->abcxyz
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +1 ^^ y
+ +2 ^ ^ z
+ +3 ^ ^
+ 0: xyz
+
/^"((?(?=[a])[^"])|b)*"$/C
"ab"
--->"ab"
Modified: code/trunk/testdata/testoutput7
===================================================================
--- code/trunk/testdata/testoutput7 2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/testdata/testoutput7 2010-11-21 18:45:10 UTC (rev 576)
@@ -7319,6 +7319,18 @@
+0 ^ x
+0 ^ x
No match
+
+/(*NO_START_OPT)xyz/C
+ abcxyz
+--->abcxyz
++15 ^ x
++15 ^ x
++15 ^ x
++15 ^ x
++16 ^^ y
++17 ^ ^ z
++18 ^ ^
+ 0: xyz
/(?C)ab/
ab