Revision: 87
http://www.exim.org/viewvc/pcre2?view=rev&revision=87
Author: ph10
Date: 2014-10-01 17:16:27 +0100 (Wed, 01 Oct 2014)
Log Message:
-----------
Make PCRE2_NO_START_OPTIMIZE a compile-only option.
Modified Paths:
--------------
code/trunk/doc/pcre2api.3
code/trunk/doc/pcre2callout.3
code/trunk/doc/pcre2jit.3
code/trunk/doc/pcre2test.1
code/trunk/src/pcre2.h.in
code/trunk/src/pcre2_dfa_match.c
code/trunk/src/pcre2_match.c
code/trunk/src/pcre2test.c
code/trunk/testdata/testinput2
code/trunk/testdata/testinput6
code/trunk/testdata/testoutput2
code/trunk/testdata/testoutput6
Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/doc/pcre2api.3 2014-10-01 16:16:27 UTC (rev 87)
@@ -930,9 +930,8 @@
.P
For those options that can be different in different parts of the pattern, the
contents of the \fIoptions\fP argument specifies their settings at the start of
-compilation. The PCRE2_ANCHORED, PCRE2_NO_UTF_CHECK, and
-PCRE2_NO_START_OPTIMIZE options can be set at the time of matching as well as
-at compile time.
+compilation. The PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK options can be set at
+the time of matching as well as at compile time.
.P
Other, less frequently required compile-time parameters (for example, the
newline setting) can be provided in a compile context (as described
@@ -1150,18 +1149,53 @@
.sp
PCRE2_NO_START_OPTIMIZE
.sp
-This is an option that acts at matching time; that is, it is really an option
-for \fBpcre2_match()\fP or \fBpcre_dfa_match()\fP. If it is set at compile
-time, it is remembered with the compiled pattern and assumed at matching time.
-This is necessary if you want to use JIT execution, because the JIT compiler
-needs to know whether or not this option is set. For details, see the
-discussion of PCRE2_NO_START_OPTIMIZE in the section on \fBpcre2_match()\fP
-options
-.\" HTML <a href="#matchoptions">
-.\" </a>
-below.
-.\"
+This is an option whose main effect is at matching time. It does not change
+what \fBpcre2_compile()\fP generates, but it does affect the output of the JIT
+compiler.
+.P
+There are a number of optimizations that may occur at the start of a match, in
+order to speed up the process. For example, if it is known that an unanchored
+match must start with a specific character, the matching code searches the
+subject for that character, and fails immediately if it cannot find it, without
+actually running the main matching function. This means that a special item
+such as (*COMMIT) at the start of a pattern is not considered until after a
+suitable starting point for the match has been found. Also, when callouts or
+(*MARK) items are in use, these "start-up" optimizations can cause them to be
+skipped if the pattern is never actually used. The start-up optimizations are
+in effect a pre-scan of the subject that takes place before the pattern is run.
+.P
+The PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations,
+possibly causing performance to suffer, but ensuring that in cases where the
+result is "no match", the callouts do occur, and that items such as (*COMMIT)
+and (*MARK) are considered at every possible starting position in the subject
+string.
+.P
+Setting PCRE2_NO_START_OPTIMIZE may change the outcome of a matching operation.
+Consider the pattern
.sp
+ (*COMMIT)ABC
+.sp
+When this is compiled, PCRE2 records the fact that a match must start with the
+character "A". Suppose the subject string is "DEFABC". The start-up
+optimization scans along the subject, finds "A" and runs the first match
+attempt from there. The (*COMMIT) item means that the pattern must match the
+current starting position, which in this case, it does. However, if the same
+match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
+subject string does not happen. The first match attempt is run starting from
+"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
+the overall result is "no match". There are also other start-up optimizations.
+For example, a minimum length for the subject may be recorded. Consider the
+pattern
+.sp
+ (*MARK:A)(X|Y)
+.sp
+The minimum length for a match is one character. If the subject is "ABC", there
+will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
+string at the end of the subject does not take place, because PCRE2 knows that
+the subject is now too short, and so the (*MARK) is never encountered. In this
+case, the optimization does not affect the overall match result, which is still
+"no match", but it does affect the auxiliary information that is returned.
+.sp
PCRE2_NO_UTF_CHECK
.sp
When PCRE2_UTF is set, the validity of the pattern as a UTF string is
@@ -1787,10 +1821,9 @@
.rs
.sp
The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be
-zero. The only bits that may be set are PCRE2_ANCHORED,
-PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
-PCRE2_PARTIAL_SOFT. Their action is described below.
+zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_NOTBOL,
+PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
+PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
.P
If the pattern was successfully processed by the just-in-time (JIT) compiler,
the only supported options for matching using the JIT code are PCRE2_NOTBOL,
@@ -1841,54 +1874,6 @@
the start of the subject is permitted. If the pattern is anchored, such a match
can occur only if the pattern contains \eK.
.sp
- PCRE2_NO_START_OPTIMIZE
-.sp
-There are a number of optimizations that \fBpcre2_match()\fP uses at the start
-of a match, in order to speed up the process. For example, if it is known that
-an unanchored match must start with a specific character, it searches the
-subject for that character, and fails immediately if it cannot find it, without
-actually running the main matching function. This means that a special item
-such as (*COMMIT) at the start of a pattern is not considered until after a
-suitable starting point for the match has been found. Also, when callouts or
-(*MARK) items are in use, these "start-up" optimizations can cause them to be
-skipped if the pattern is never actually used. The start-up optimizations are
-in effect a pre-scan of the subject that takes place before the pattern is run.
-.P
-The PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations,
-possibly causing performance to suffer, but ensuring that in cases where the
-result is "no match", the callouts do occur, and that items such as (*COMMIT)
-and (*MARK) are considered at every possible starting position in the subject
-string. If PCRE2_NO_START_OPTIMIZE is set at compile time, it cannot be unset
-at matching time. The use of PCRE2_NO_START_OPTIMIZE at matching time (that is,
-passing it to \fBpcre2_match()\fP) disables JIT execution; in this situation,
-matching is always done using interpretively.
-.P
-Setting PCRE2_NO_START_OPTIMIZE can change the outcome of a matching operation.
-Consider the pattern
-.sp
- (*COMMIT)ABC
-.sp
-When this is compiled, PCRE2 records the fact that a match must start with the
-character "A". Suppose the subject string is "DEFABC". The start-up
-optimization scans along the subject, finds "A" and runs the first match
-attempt from there. The (*COMMIT) item means that the pattern must match the
-current starting position, which in this case, it does. However, if the same
-match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
-subject string does not happen. The first match attempt is run starting from
-"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
-the overall result is "no match". There are also other start-up optimizations.
-For example, a minimum length for the subject may be recorded. Consider the
-pattern
-.sp
- (*MARK:A)(X|Y)
-.sp
-The minimum length for a match is one character. If the subject is "ABC", there
-will be attempts to match "ABC", "BC", and "C". An attempt to match an empty
-string at the end of the subject does not take place, because PCRE2 knows that
-the subject is now too short, and so the (*MARK) is never encountered. In this
-case, the optimization does not affect the overall match result, which is still
-"no match", but it does affect the auxiliary information that is returned.
-.sp
PCRE2_NO_UTF_CHECK
.sp
When PCRE2_UTF is set at compile time, the validity of the subject as a UTF
@@ -2550,10 +2535,9 @@
The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must
be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_NOTBOL,
PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
-PCRE2_NO_START_OPTIMIZE, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT,
-PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last four of these are
-exactly the same as for \fBpcre2_match()\fP, so their description is not
-repeated here.
+PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and
+PCRE2_DFA_RESTART. All but the last four of these are exactly the same as for
+\fBpcre2_match()\fP, so their description is not repeated here.
.sp
PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT
Modified: code/trunk/doc/pcre2callout.3
===================================================================
--- code/trunk/doc/pcre2callout.3 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/doc/pcre2callout.3 2014-10-01 16:16:27 UTC (rev 87)
@@ -111,7 +111,7 @@
long enough, or, for unanchored patterns, if it has been scanned far enough.
.P
You can disable these optimizations by passing the PCRE2_NO_START_OPTIMIZE
-option to the matching function, or by starting the pattern with
+option to \fBpcre2_compile()\fP, or by starting the pattern with
(*NO_START_OPT). This slows down the matching process, but does ensure that
callouts such as the example above are obeyed.
.
Modified: code/trunk/doc/pcre2jit.3
===================================================================
--- code/trunk/doc/pcre2jit.3 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/doc/pcre2jit.3 2014-10-01 16:16:27 UTC (rev 87)
@@ -107,9 +107,8 @@
.sp
The \fBpcre2_match()\fP options that are supported for JIT matching are
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The options
-that are not supported at match time are PCRE2_ANCHORED and
-PCRE2_NO_START_OPTIMIZE, though they are supported if given at compile time.
+PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
+PCRE2_ANCHORED option is not supported at match time.
.P
The only unsupported pattern items are \eC (match a single data unit) when
running in a UTF mode, and a callout immediately before an assertion condition
Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/doc/pcre2test.1 2014-10-01 16:16:27 UTC (rev 87)
@@ -662,7 +662,6 @@
anchored set PCRE2_ANCHORED
dfa_restart set PCRE2_DFA_RESTART
dfa_shortest set PCRE2_DFA_SHORTEST
- no_start_optimize set PCRE2_NO_START_OPTIMIZE
no_utf_check set PCRE2_NO_UTF_CHECK
notbol set PCRE2_NOTBOL
notempty set PCRE2_NOTEMPTY
Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/src/pcre2.h.in 2014-10-01 16:16:27 UTC (rev 87)
@@ -86,8 +86,7 @@
others can be added next to them */
#define PCRE2_ANCHORED 0x80000000u
-#define PCRE2_NO_START_OPTIMIZE 0x40000000u
-#define PCRE2_NO_UTF_CHECK 0x20000000u
+#define PCRE2_NO_UTF_CHECK 0x40000000u
/* Other options that can be passed to pcre2_compile(). They may affect
compilation, JIT compilation, and/or interpretive execution. The following tags
@@ -95,7 +94,7 @@
C alters what is compiled
J alters what JIT compiles
-E is inspected during pcre2_match() execution
+M is inspected during pcre2_match() execution
D is inspected during pcre2_dfa_match() execution
*/
@@ -103,20 +102,21 @@
#define PCRE2_ALT_BSUX 0x00000002u /* C */
#define PCRE2_AUTO_CALLOUT 0x00000004u /* C */
#define PCRE2_CASELESS 0x00000008u /* C */
-#define PCRE2_DOLLAR_ENDONLY 0x00000010u /* J E D */
+#define PCRE2_DOLLAR_ENDONLY 0x00000010u /* J M D */
#define PCRE2_DOTALL 0x00000020u /* C */
#define PCRE2_DUPNAMES 0x00000040u /* C */
#define PCRE2_EXTENDED 0x00000080u /* C */
-#define PCRE2_FIRSTLINE 0x00000100u /* J E D */
-#define PCRE2_MATCH_UNSET_BACKREF 0x00000200u /* C J E */
+#define PCRE2_FIRSTLINE 0x00000100u /* J M D */
+#define PCRE2_MATCH_UNSET_BACKREF 0x00000200u /* C J M */
#define PCRE2_MULTILINE 0x00000400u /* C */
#define PCRE2_NEVER_UCP 0x00000800u /* C */
#define PCRE2_NEVER_UTF 0x00001000u /* C */
#define PCRE2_NO_AUTO_CAPTURE 0x00002000u /* C */
#define PCRE2_NO_AUTO_POSSESS 0x00004000u /* C */
-#define PCRE2_UCP 0x00008000u /* C J E D */
-#define PCRE2_UNGREEDY 0x00010000u /* C */
-#define PCRE2_UTF 0x00020000u /* C J E D */
+#define PCRE2_NO_START_OPTIMIZE 0x00008000u /* J M D */
+#define PCRE2_UCP 0x00010000u /* C J M D */
+#define PCRE2_UNGREEDY 0x00020000u /* C */
+#define PCRE2_UTF 0x00040000u /* C J M D */
/* These are for pcre2_jit_compile(). */
Modified: code/trunk/src/pcre2_dfa_match.c
===================================================================
--- code/trunk/src/pcre2_dfa_match.c 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/src/pcre2_dfa_match.c 2014-10-01 16:16:27 UTC (rev 87)
@@ -85,8 +85,7 @@
#define PUBLIC_DFA_MATCH_OPTIONS \
(PCRE2_ANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
- PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART| \
- PCRE2_NO_START_OPTIMIZE)
+ PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART)
/*************************************************
@@ -3319,12 +3318,12 @@
/* There are some optimizations that avoid running the match if a known
starting point is not found, or if a known later code unit is not present.
- However, there is an option (settable at compile or match time) that disables
+ However, there is an option (settable at compile time) that disables
these, for testing and for ensuring that all callouts do actually occur.
- The must also be avoided when restarting a DFA match. */
+ The optimizations must also be avoided when restarting a DFA match. */
- if (((options | re->overall_options) &
- (PCRE2_NO_START_OPTIMIZE|PCRE2_DFA_RESTART)) == 0)
+ if ((re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0 &&
+ (options & PCRE2_DFA_RESTART) == 0)
{
PCRE2_SPTR save_end_subject = end_subject;
Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/src/pcre2_match.c 2014-10-01 16:16:27 UTC (rev 87)
@@ -55,7 +55,7 @@
#define PUBLIC_MATCH_OPTIONS \
(PCRE2_ANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
- PCRE2_PARTIAL_SOFT|PCRE2_NO_START_OPTIMIZE)
+ PCRE2_PARTIAL_SOFT)
#define PUBLIC_JIT_MATCH_OPTIONS \
(PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\
@@ -6687,10 +6687,10 @@
/* There are some optimizations that avoid running the match if a known
starting point is not found, or if a known later code unit is not present.
- However, there is an option (settable at compile or match time) that disables
- these, for testing and for ensuring that all callouts do actually occur. */
+ However, there is an option (settable at compile time) that disables these,
+ for testing and for ensuring that all callouts do actually occur. */
- if (((options | re->overall_options) & PCRE2_NO_START_OPTIMIZE) == 0)
+ if ((re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0)
{
PCRE2_SPTR save_end_subject = end_subject;
Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/src/pcre2test.c 2014-10-01 16:16:27 UTC (rev 87)
@@ -461,7 +461,7 @@
{ "newline", MOD_CTB, MOD_NL, MO(newline_convention), CO(newline_convention) },
{ "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) },
{ "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) },
- { "no_start_optimize", MOD_PDP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PD(options) },
+ { "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) },
{ "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) },
{ "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) },
{ "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) },
@@ -3058,11 +3058,10 @@
static void
show_match_options(uint32_t options)
{
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s",
((options & PCRE2_ANCHORED) != 0)? " anchored" : "",
((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "",
((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "",
- ((options & PCRE2_NO_START_OPTIMIZE) != 0)? " no_start_optimize" : "",
((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "",
((options & PCRE2_NOTBOL) != 0)? " notbol" : "",
((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "",
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/testdata/testinput2 2014-10-01 16:16:27 UTC (rev 87)
@@ -2491,13 +2491,16 @@
/xyz/auto_callout
xyz
abcxyz
- abcxyz\=no_start_optimize
** Failers
abc
- abc\=no_start_optimize
abcxypqr
- abcxypqr\=no_start_optimize
+/xyz/auto_callout,no_start_optimize
+ abcxyz
+ ** Failers
+ abc
+ abcxypqr
+
/(*NO_START_OPT)xyz/auto_callout
abcxyz
@@ -2987,8 +2990,10 @@
/(*COMMIT)ABC/
ABCDEFG
+
+/(*COMMIT)ABC/no_start_optimize
** Failers
- DEFGABC\=no_start_optimize
+ DEFGABC
/^(ab (c+(*THEN)cd) | xyz)/x
abcccd
Modified: code/trunk/testdata/testinput6
===================================================================
(Binary files differ)
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/testdata/testoutput2 2014-10-01 16:16:27 UTC (rev 87)
@@ -8941,7 +8941,15 @@
+2 ^ ^ z
+3 ^ ^
0: xyz
- abcxyz\=no_start_optimize
+ ** Failers
+No match
+ abc
+No match
+ abcxypqr
+No match
+
+/xyz/auto_callout,no_start_optimize
+ abcxyz
--->abcxyz
+0 ^ x
+0 ^ x
@@ -8952,10 +8960,20 @@
+3 ^ ^
0: xyz
** Failers
+--->** Failers
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
+ +0 ^ x
No match
abc
-No match
- abc\=no_start_optimize
--->abc
+0 ^ x
+0 ^ x
@@ -8963,8 +8981,6 @@
+0 ^ x
No match
abcxypqr
-No match
- abcxypqr\=no_start_optimize
--->abcxypqr
+0 ^ x
+0 ^ x
@@ -10182,9 +10198,11 @@
/(*COMMIT)ABC/
ABCDEFG
0: ABC
+
+/(*COMMIT)ABC/no_start_optimize
** Failers
No match
- DEFGABC\=no_start_optimize
+ DEFGABC
No match
/^(ab (c+(*THEN)cd) | xyz)/x
Modified: code/trunk/testdata/testoutput6
===================================================================
(Binary files differ)