[Pcre-svn] [87] code/trunk: Make PCRE2_NO_START_OPTIMIZE a c…

トップ ページ
このメッセージを削除
著者: Subversion repository
日付:  
To: pcre-svn
題目: [Pcre-svn] [87] code/trunk: Make PCRE2_NO_START_OPTIMIZE a compile-only option.
Revision: 87
          http://www.exim.org/viewvc/pcre2?view=rev&revision=87
Author:   ph10
Date:     2014-10-01 17:16:27 +0100 (Wed, 01 Oct 2014)


Log Message:
-----------
Make PCRE2_NO_START_OPTIMIZE a compile-only option.

Modified Paths:
--------------
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2callout.3
    code/trunk/doc/pcre2jit.3
    code/trunk/doc/pcre2test.1
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_dfa_match.c
    code/trunk/src/pcre2_match.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput6


Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/doc/pcre2api.3    2014-10-01 16:16:27 UTC (rev 87)
@@ -930,9 +930,8 @@
 .P
 For those options that can be different in different parts of the pattern, the
 contents of the \fIoptions\fP argument specifies their settings at the start of
-compilation. The PCRE2_ANCHORED, PCRE2_NO_UTF_CHECK, and
-PCRE2_NO_START_OPTIMIZE options can be set at the time of matching as well as
-at compile time.
+compilation. The PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK options can be set at
+the time of matching as well as at compile time.
 .P
 Other, less frequently required compile-time parameters (for example, the 
 newline setting) can be provided in a compile context (as described
@@ -1150,18 +1149,53 @@
 .sp
   PCRE2_NO_START_OPTIMIZE
 .sp
-This is an option that acts at matching time; that is, it is really an option
-for \fBpcre2_match()\fP or \fBpcre_dfa_match()\fP. If it is set at compile
-time, it is remembered with the compiled pattern and assumed at matching time.
-This is necessary if you want to use JIT execution, because the JIT compiler
-needs to know whether or not this option is set. For details, see the
-discussion of PCRE2_NO_START_OPTIMIZE in the section on \fBpcre2_match()\fP 
-options
-.\" HTML <a href="#matchoptions">
-.\" </a>
-below.
-.\"
+This is an option whose main effect is at matching time. It does not change
+what \fBpcre2_compile()\fP generates, but it does affect the output of the JIT
+compiler.
+.P
+There are a number of optimizations that may occur at the start of a match, in
+order to speed up the process. For example, if it is known that an unanchored
+match must start with a specific character, the matching code searches the
+subject for that character, and fails immediately if it cannot find it, without
+actually running the main matching function. This means that a special item
+such as (*COMMIT) at the start of a pattern is not considered until after a
+suitable starting point for the match has been found. Also, when callouts or
+(*MARK) items are in use, these "start-up" optimizations can cause them to be
+skipped if the pattern is never actually used. The start-up optimizations are
+in effect a pre-scan of the subject that takes place before the pattern is run.
+.P
+The PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations,
+possibly causing performance to suffer, but ensuring that in cases where the
+result is "no match", the callouts do occur, and that items such as (*COMMIT)
+and (*MARK) are considered at every possible starting position in the subject
+string.
+.P
+Setting PCRE2_NO_START_OPTIMIZE may change the outcome of a matching operation.
+Consider the pattern
 .sp
+  (*COMMIT)ABC
+.sp
+When this is compiled, PCRE2 records the fact that a match must start with the
+character "A". Suppose the subject string is "DEFABC". The start-up
+optimization scans along the subject, finds "A" and runs the first match
+attempt from there. The (*COMMIT) item means that the pattern must match the
+current starting position, which in this case, it does. However, if the same
+match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
+subject string does not happen. The first match attempt is run starting from
+"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
+the overall result is "no match". There are also other start-up optimizations.
+For example, a minimum length for the subject may be recorded. Consider the
+pattern
+.sp
+  (*MARK:A)(X|Y)
+.sp
+The minimum length for a match is one character. If the subject is "ABC", there
+will be attempts to match "ABC", "BC", and "C". An attempt to match an empty 
+string at the end of the subject does not take place, because PCRE2 knows that
+the subject is now too short, and so the (*MARK) is never encountered. In this
+case, the optimization does not affect the overall match result, which is still
+"no match", but it does affect the auxiliary information that is returned.
+.sp
   PCRE2_NO_UTF_CHECK
 .sp
 When PCRE2_UTF is set, the validity of the pattern as a UTF string is
@@ -1787,10 +1821,9 @@
 .rs
 .sp
 The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be
-zero. The only bits that may be set are PCRE2_ANCHORED, 
-PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
-PCRE2_PARTIAL_SOFT. Their action is described below.
+zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_NOTBOL,
+PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
+PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
 .P
 If the pattern was successfully processed by the just-in-time (JIT) compiler,
 the only supported options for matching using the JIT code are PCRE2_NOTBOL,
@@ -1841,54 +1874,6 @@
 the start of the subject is permitted. If the pattern is anchored, such a match
 can occur only if the pattern contains \eK.
 .sp
-  PCRE2_NO_START_OPTIMIZE
-.sp
-There are a number of optimizations that \fBpcre2_match()\fP uses at the start
-of a match, in order to speed up the process. For example, if it is known that
-an unanchored match must start with a specific character, it searches the
-subject for that character, and fails immediately if it cannot find it, without
-actually running the main matching function. This means that a special item
-such as (*COMMIT) at the start of a pattern is not considered until after a
-suitable starting point for the match has been found. Also, when callouts or
-(*MARK) items are in use, these "start-up" optimizations can cause them to be
-skipped if the pattern is never actually used. The start-up optimizations are
-in effect a pre-scan of the subject that takes place before the pattern is run.
-.P
-The PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations,
-possibly causing performance to suffer, but ensuring that in cases where the
-result is "no match", the callouts do occur, and that items such as (*COMMIT)
-and (*MARK) are considered at every possible starting position in the subject
-string. If PCRE2_NO_START_OPTIMIZE is set at compile time, it cannot be unset
-at matching time. The use of PCRE2_NO_START_OPTIMIZE at matching time (that is,
-passing it to \fBpcre2_match()\fP) disables JIT execution; in this situation,
-matching is always done using interpretively.
-.P
-Setting PCRE2_NO_START_OPTIMIZE can change the outcome of a matching operation.
-Consider the pattern
-.sp
-  (*COMMIT)ABC
-.sp
-When this is compiled, PCRE2 records the fact that a match must start with the
-character "A". Suppose the subject string is "DEFABC". The start-up
-optimization scans along the subject, finds "A" and runs the first match
-attempt from there. The (*COMMIT) item means that the pattern must match the
-current starting position, which in this case, it does. However, if the same
-match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
-subject string does not happen. The first match attempt is run starting from
-"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
-the overall result is "no match". There are also other start-up optimizations.
-For example, a minimum length for the subject may be recorded. Consider the
-pattern
-.sp
-  (*MARK:A)(X|Y)
-.sp
-The minimum length for a match is one character. If the subject is "ABC", there
-will be attempts to match "ABC", "BC", and "C". An attempt to match an empty 
-string at the end of the subject does not take place, because PCRE2 knows that
-the subject is now too short, and so the (*MARK) is never encountered. In this
-case, the optimization does not affect the overall match result, which is still
-"no match", but it does affect the auxiliary information that is returned.
-.sp
   PCRE2_NO_UTF_CHECK
 .sp
 When PCRE2_UTF is set at compile time, the validity of the subject as a UTF
@@ -2550,10 +2535,9 @@
 The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must
 be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_NOTBOL,
 PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
-PCRE2_NO_START_OPTIMIZE, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT,
-PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last four of these are
-exactly the same as for \fBpcre2_match()\fP, so their description is not
-repeated here.
+PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and
+PCRE2_DFA_RESTART. All but the last four of these are exactly the same as for
+\fBpcre2_match()\fP, so their description is not repeated here.
 .sp
   PCRE2_PARTIAL_HARD
   PCRE2_PARTIAL_SOFT


Modified: code/trunk/doc/pcre2callout.3
===================================================================
--- code/trunk/doc/pcre2callout.3    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/doc/pcre2callout.3    2014-10-01 16:16:27 UTC (rev 87)
@@ -111,7 +111,7 @@
 long enough, or, for unanchored patterns, if it has been scanned far enough.
 .P
 You can disable these optimizations by passing the PCRE2_NO_START_OPTIMIZE
-option to the matching function, or by starting the pattern with
+option to \fBpcre2_compile()\fP, or by starting the pattern with
 (*NO_START_OPT). This slows down the matching process, but does ensure that
 callouts such as the example above are obeyed.
 .


Modified: code/trunk/doc/pcre2jit.3
===================================================================
--- code/trunk/doc/pcre2jit.3    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/doc/pcre2jit.3    2014-10-01 16:16:27 UTC (rev 87)
@@ -107,9 +107,8 @@
 .sp
 The \fBpcre2_match()\fP options that are supported for JIT matching are
 PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
-PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The options 
-that are not supported at match time are PCRE2_ANCHORED and
-PCRE2_NO_START_OPTIMIZE, though they are supported if given at compile time.
+PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
+PCRE2_ANCHORED option is not supported at match time.
 .P
 The only unsupported pattern items are \eC (match a single data unit) when
 running in a UTF mode, and a callout immediately before an assertion condition


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/doc/pcre2test.1    2014-10-01 16:16:27 UTC (rev 87)
@@ -662,7 +662,6 @@
       anchored                  set PCRE2_ANCHORED
       dfa_restart               set PCRE2_DFA_RESTART
       dfa_shortest              set PCRE2_DFA_SHORTEST
-      no_start_optimize         set PCRE2_NO_START_OPTIMIZE
       no_utf_check              set PCRE2_NO_UTF_CHECK
       notbol                    set PCRE2_NOTBOL
       notempty                  set PCRE2_NOTEMPTY


Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/src/pcre2.h.in    2014-10-01 16:16:27 UTC (rev 87)
@@ -86,8 +86,7 @@
 others can be added next to them */


 #define PCRE2_ANCHORED            0x80000000u
-#define PCRE2_NO_START_OPTIMIZE   0x40000000u
-#define PCRE2_NO_UTF_CHECK        0x20000000u
+#define PCRE2_NO_UTF_CHECK        0x40000000u


/* Other options that can be passed to pcre2_compile(). They may affect
compilation, JIT compilation, and/or interpretive execution. The following tags
@@ -95,7 +94,7 @@

C alters what is compiled
J alters what JIT compiles
-E is inspected during pcre2_match() execution
+M is inspected during pcre2_match() execution
D is inspected during pcre2_dfa_match() execution
*/

@@ -103,20 +102,21 @@
 #define PCRE2_ALT_BSUX            0x00000002u  /* C       */
 #define PCRE2_AUTO_CALLOUT        0x00000004u  /* C       */
 #define PCRE2_CASELESS            0x00000008u  /* C       */
-#define PCRE2_DOLLAR_ENDONLY      0x00000010u  /*   J E D */
+#define PCRE2_DOLLAR_ENDONLY      0x00000010u  /*   J M D */
 #define PCRE2_DOTALL              0x00000020u  /* C       */
 #define PCRE2_DUPNAMES            0x00000040u  /* C       */
 #define PCRE2_EXTENDED            0x00000080u  /* C       */
-#define PCRE2_FIRSTLINE           0x00000100u  /*   J E D */
-#define PCRE2_MATCH_UNSET_BACKREF 0x00000200u  /* C J E   */
+#define PCRE2_FIRSTLINE           0x00000100u  /*   J M D */
+#define PCRE2_MATCH_UNSET_BACKREF 0x00000200u  /* C J M   */
 #define PCRE2_MULTILINE           0x00000400u  /* C       */
 #define PCRE2_NEVER_UCP           0x00000800u  /* C       */
 #define PCRE2_NEVER_UTF           0x00001000u  /* C       */
 #define PCRE2_NO_AUTO_CAPTURE     0x00002000u  /* C       */
 #define PCRE2_NO_AUTO_POSSESS     0x00004000u  /* C       */
-#define PCRE2_UCP                 0x00008000u  /* C J E D */
-#define PCRE2_UNGREEDY            0x00010000u  /* C       */
-#define PCRE2_UTF                 0x00020000u  /* C J E D */
+#define PCRE2_NO_START_OPTIMIZE   0x00008000u  /*   J M D */
+#define PCRE2_UCP                 0x00010000u  /* C J M D */
+#define PCRE2_UNGREEDY            0x00020000u  /* C       */
+#define PCRE2_UTF                 0x00040000u  /* C J M D */


/* These are for pcre2_jit_compile(). */


Modified: code/trunk/src/pcre2_dfa_match.c
===================================================================
--- code/trunk/src/pcre2_dfa_match.c    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/src/pcre2_dfa_match.c    2014-10-01 16:16:27 UTC (rev 87)
@@ -85,8 +85,7 @@
 #define PUBLIC_DFA_MATCH_OPTIONS \
   (PCRE2_ANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
    PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
-   PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART| \
-   PCRE2_NO_START_OPTIMIZE)
+   PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART)



/*************************************************
@@ -3319,12 +3318,12 @@

/* There are some optimizations that avoid running the match if a known
starting point is not found, or if a known later code unit is not present.
- However, there is an option (settable at compile or match time) that disables
+ However, there is an option (settable at compile time) that disables
these, for testing and for ensuring that all callouts do actually occur.
- The must also be avoided when restarting a DFA match. */
+ The optimizations must also be avoided when restarting a DFA match. */

-  if (((options | re->overall_options) &
-       (PCRE2_NO_START_OPTIMIZE|PCRE2_DFA_RESTART)) == 0)
+  if ((re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0 &&
+      (options & PCRE2_DFA_RESTART) == 0)
     {
     PCRE2_SPTR save_end_subject = end_subject;



Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/src/pcre2_match.c    2014-10-01 16:16:27 UTC (rev 87)
@@ -55,7 +55,7 @@
 #define PUBLIC_MATCH_OPTIONS \
   (PCRE2_ANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
    PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
-   PCRE2_PARTIAL_SOFT|PCRE2_NO_START_OPTIMIZE)
+   PCRE2_PARTIAL_SOFT)


 #define PUBLIC_JIT_MATCH_OPTIONS \
    (PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\
@@ -6687,10 +6687,10 @@


/* There are some optimizations that avoid running the match if a known
starting point is not found, or if a known later code unit is not present.
- However, there is an option (settable at compile or match time) that disables
- these, for testing and for ensuring that all callouts do actually occur. */
+ However, there is an option (settable at compile time) that disables these,
+ for testing and for ensuring that all callouts do actually occur. */

-  if (((options | re->overall_options) & PCRE2_NO_START_OPTIMIZE) == 0)
+  if ((re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0)
     {
     PCRE2_SPTR save_end_subject = end_subject;



Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/src/pcre2test.c    2014-10-01 16:16:27 UTC (rev 87)
@@ -461,7 +461,7 @@
   { "newline",             MOD_CTB,  MOD_NL,  MO(newline_convention),    CO(newline_convention) },
   { "no_auto_capture",     MOD_PAT,  MOD_OPT, PCRE2_NO_AUTO_CAPTURE,     PO(options) },
   { "no_auto_possess",     MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS,     PO(options) },
-  { "no_start_optimize",   MOD_PDP,  MOD_OPT, PCRE2_NO_START_OPTIMIZE,   PD(options) },
+  { "no_start_optimize",   MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE,   PO(options) },
   { "no_utf_check",        MOD_PD,   MOD_OPT, PCRE2_NO_UTF_CHECK,        PD(options) },
   { "notbol",              MOD_DAT,  MOD_OPT, PCRE2_NOTBOL,              DO(options) },
   { "notempty",            MOD_DAT,  MOD_OPT, PCRE2_NOTEMPTY,            DO(options) },
@@ -3058,11 +3058,10 @@
 static void
 show_match_options(uint32_t options)
 {
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s",
   ((options & PCRE2_ANCHORED) != 0)? " anchored" : "",
   ((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "",
   ((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "",
-  ((options & PCRE2_NO_START_OPTIMIZE) != 0)? " no_start_optimize" : "",
   ((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "",
   ((options & PCRE2_NOTBOL) != 0)? " notbol" : "",
   ((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "",


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/testdata/testinput2    2014-10-01 16:16:27 UTC (rev 87)
@@ -2491,13 +2491,16 @@
 /xyz/auto_callout
   xyz 
   abcxyz 
-  abcxyz\=no_start_optimize
   ** Failers 
   abc
-  abc\=no_start_optimize
   abcxypqr  
-  abcxypqr\=no_start_optimize


+/xyz/auto_callout,no_start_optimize
+ abcxyz
+ ** Failers
+ abc
+ abcxypqr
+
/(*NO_START_OPT)xyz/auto_callout
abcxyz

@@ -2987,8 +2990,10 @@

 /(*COMMIT)ABC/
     ABCDEFG
+    
+/(*COMMIT)ABC/no_start_optimize
     ** Failers
-    DEFGABC\=no_start_optimize
+    DEFGABC


 /^(ab (c+(*THEN)cd) | xyz)/x
     abcccd  


Modified: code/trunk/testdata/testinput6
===================================================================
(Binary files differ)

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2014-09-30 16:30:39 UTC (rev 86)
+++ code/trunk/testdata/testoutput2    2014-10-01 16:16:27 UTC (rev 87)
@@ -8941,7 +8941,15 @@
  +2    ^ ^     z
  +3    ^  ^    
  0: xyz
-  abcxyz\=no_start_optimize
+  ** Failers 
+No match
+  abc
+No match
+  abcxypqr  
+No match
+  
+/xyz/auto_callout,no_start_optimize
+  abcxyz 
 --->abcxyz
  +0 ^          x
  +0  ^         x
@@ -8952,10 +8960,20 @@
  +3    ^  ^    
  0: xyz
   ** Failers 
+--->** Failers
+ +0 ^              x
+ +0  ^             x
+ +0   ^            x
+ +0    ^           x
+ +0     ^          x
+ +0      ^         x
+ +0       ^        x
+ +0        ^       x
+ +0         ^      x
+ +0          ^     x
+ +0           ^    x
 No match
   abc
-No match
-  abc\=no_start_optimize
 --->abc
  +0 ^       x
  +0  ^      x
@@ -8963,8 +8981,6 @@
  +0    ^    x
 No match
   abcxypqr  
-No match
-  abcxypqr\=no_start_optimize
 --->abcxypqr
  +0 ^            x
  +0  ^           x
@@ -10182,9 +10198,11 @@
 /(*COMMIT)ABC/
     ABCDEFG
  0: ABC
+    
+/(*COMMIT)ABC/no_start_optimize
     ** Failers
 No match
-    DEFGABC\=no_start_optimize
+    DEFGABC
 No match


/^(ab (c+(*THEN)cd) | xyz)/x

Modified: code/trunk/testdata/testoutput6
===================================================================
(Binary files differ)