[Pcre-svn] [576] code/trunk: Added support for (*NO_START_OP…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [576] code/trunk: Added support for (*NO_START_OPT)
Revision: 576
          http://vcs.pcre.org/viewvc?view=rev&revision=576
Author:   ph10
Date:     2010-11-21 18:45:10 +0000 (Sun, 21 Nov 2010)


Log Message:
-----------
Added support for (*NO_START_OPT)

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcresyntax.3
    code/trunk/doc/pcretest.1
    code/trunk/pcre.h.in
    code/trunk/pcre_compile.c
    code/trunk/pcre_dfa_exec.c
    code/trunk/pcre_exec.c
    code/trunk/pcre_internal.h
    code/trunk/pcretest.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testinput7
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput7


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/ChangeLog    2010-11-21 18:45:10 UTC (rev 576)
@@ -113,6 +113,13 @@
     compile-time error is now given if \c is not followed by an ASCII 
     character, that is, a byte less than 128. (In EBCDIC mode, the code is 
     different, and any byte value is allowed.)
+    
+20. Recognize (*NO_START_OPT) at the start of a pattern to set the PCRE_NO_
+    START_OPTIMIZE option, which is now allowed at compile time - but just
+    passed through to pcre_exec() or pcre_dfa_exec(). This makes it available
+    to pcregrep and other applications that have no direct access to PCRE 
+    options. The new /Y option in pcretest sets this option when calling 
+    pcre_compile().  



Version 8.10 25-Jun-2010

Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/doc/pcreapi.3    2010-11-21 18:45:10 UTC (rev 576)
@@ -428,8 +428,9 @@
 documentation). For those options that can be different in different parts of
 the pattern, the contents of the \fIoptions\fP argument specifies their
 settings at the start of compilation and execution. The PCRE_ANCHORED,
-PCRE_BSR_\fIxxx\fP, and PCRE_NEWLINE_\fIxxx\fP options can be set at the time
-of matching as well as at compile time.
+PCRE_BSR_\fIxxx\fP, PCRE_NEWLINE_\fIxxx\fP, PCRE_NO_UTF8_CHECK, and
+PCRE_NO_START_OPT options can be set at the time of matching as well as at
+compile time.
 .P
 If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
 Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns
@@ -658,6 +659,17 @@
 they acquire numbers in the usual way). There is no equivalent of this option
 in Perl.
 .sp
+  NO_START_OPTIMIZE
+.sp
+This is an option that acts at matching time; that is, it is really an option
+for \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. If it is set at compile time,
+it is remembered with the compiled pattern and assumed at matching time. For
+details see the discussion of PCRE_NO_START_OPTIMIZE
+.\" HTML <a href="#execoptions">
+.\" </a>
+below.
+.\"
+.sp
   PCRE_UCP
 .sp
 This option changes the way PCRE processes \eB, \eb, \eD, \ed, \eS, \es, \eW,
@@ -1487,7 +1499,10 @@
 The PCRE_NO_START_OPTIMIZE option disables the start-up optimizations, possibly
 causing performance to suffer, but ensuring that in cases where the result is
 "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
-are considered at every possible starting position in the subject string.
+are considered at every possible starting position in the subject string. If 
+PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching 
+time.
+.P
 Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
 Consider the pattern
 .sp
@@ -2252,6 +2267,6 @@
 .rs
 .sp
 .nf
-Last updated: 13 November 2010
+Last updated: 21 November 2010
 Copyright (c) 1997-2010 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/doc/pcrepattern.3    2010-11-21 18:45:10 UTC (rev 576)
@@ -52,6 +52,11 @@
 instead of recognizing only characters with codes less than 128 via a lookup
 table.
 .P
+If a pattern starts with (*NO_START_OPT), it has the same effect as setting the
+PCRE_NO_START_OPTIMIZE option either at compile or matching time. There are 
+also some more of these special sequences that are concerned with the handling 
+of newlines; they are described below.
+.P
 The remainder of this document discusses the patterns that are supported by
 PCRE when its main matching function, \fBpcre_exec()\fP, is used.
 From release 6.0, PCRE offers a second matching function,


Modified: code/trunk/doc/pcresyntax.3
===================================================================
--- code/trunk/doc/pcresyntax.3    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/doc/pcresyntax.3    2010-11-21 18:45:10 UTC (rev 576)
@@ -24,7 +24,7 @@
 .rs
 .sp
   \ea         alarm, that is, the BEL character (hex 07)
-  \ecx        "control-x", where x is any character
+  \ecx        "control-x", where x is any ASCII character
   \ee         escape (hex 1B)
   \ef         formfeed (hex 0C)
   \en         newline (hex 0A)
@@ -336,6 +336,7 @@
 The following are recognized only at the start of a pattern or after one of the
 newline-setting options with similar syntax:
 .sp
+  (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
   (*UTF8)         set UTF-8 mode (PCRE_UTF8)
   (*UCP)          set PCRE_UCP (use Unicode properties for \ed etc)
 .
@@ -473,6 +474,6 @@
 .rs
 .sp
 .nf
-Last updated: 12 May 2010
+Last updated: 21 November 2010
 Copyright (c) 1997-2010 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/doc/pcretest.1    2010-11-21 18:45:10 UTC (rev 576)
@@ -179,6 +179,7 @@
   \fB/U\fP              PCRE_UNGREEDY
   \fB/W\fP              PCRE_UCP
   \fB/X\fP              PCRE_EXTRA
+  \fB/Y\fP              PCRE_NO_START_OPTIMIZE 
   \fB/<JS>\fP           PCRE_JAVASCRIPT_COMPAT
   \fB/<cr>\fP           PCRE_NEWLINE_CR
   \fB/<lf>\fP           PCRE_NEWLINE_LF
@@ -778,6 +779,6 @@
 .rs
 .sp
 .nf
-Last updated: 07 November 2010
+Last updated: 21 November 2010
 Copyright (c) 1997-2010 University of Cambridge.
 .fi


Modified: code/trunk/pcre.h.in
===================================================================
--- code/trunk/pcre.h.in    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre.h.in    2010-11-21 18:45:10 UTC (rev 576)
@@ -129,7 +129,7 @@
 #define PCRE_BSR_ANYCRLF        0x00800000  /* Compile, exec, DFA exec */
 #define PCRE_BSR_UNICODE        0x01000000  /* Compile, exec, DFA exec */
 #define PCRE_JAVASCRIPT_COMPAT  0x02000000  /* Compile */
-#define PCRE_NO_START_OPTIMIZE  0x04000000  /* Exec, DFA exec */
+#define PCRE_NO_START_OPTIMIZE  0x04000000  /* Compile, exec, DFA exec */
 #define PCRE_NO_START_OPTIMISE  0x04000000  /* Synonym */
 #define PCRE_PARTIAL_HARD       0x08000000  /* Exec, DFA exec */
 #define PCRE_NOTEMPTY_ATSTART   0x10000000  /* Exec, DFA exec */


Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre_compile.c    2010-11-21 18:45:10 UTC (rev 576)
@@ -6859,6 +6859,8 @@
     { skipatstart += 7; options |= PCRE_UTF8; continue; }
   else if (strncmp((char *)(ptr+skipatstart+2), STRING_UCP_RIGHTPAR, 4) == 0)
     { skipatstart += 6; options |= PCRE_UCP; continue; }
+  else if (strncmp((char *)(ptr+skipatstart+2), STRING_NO_START_OPT_RIGHTPAR, 13) == 0)
+    { skipatstart += 15; options |= PCRE_NO_START_OPTIMIZE; continue; }


   if (strncmp((char *)(ptr+skipatstart+2), STRING_CR_RIGHTPAR, 3) == 0)
     { skipatstart += 5; newnl = PCRE_NEWLINE_CR; }


Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre_dfa_exec.c    2010-11-21 18:45:10 UTC (rev 576)
@@ -3057,9 +3057,11 @@


     /* There are some optimizations that avoid running the match if a known
     starting point is not found. However, there is an option that disables
-    these, for testing and for ensuring that all callouts do actually occur. */
+    these, for testing and for ensuring that all callouts do actually occur. 
+    The option can be set in the regex by (*NO_START_OPT) or passed in
+    match-time options. */


-    if ((options & PCRE_NO_START_OPTIMIZE) == 0)
+    if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0)
       {
       /* Advance to a known first byte. */



Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre_exec.c    2010-11-21 18:45:10 UTC (rev 576)
@@ -5936,9 +5936,10 @@
   /* There are some optimizations that avoid running the match if a known
   starting point is not found, or if a known later character is not present.
   However, there is an option that disables these, for testing and for ensuring
-  that all callouts do actually occur. */
+  that all callouts do actually occur. The option can be set in the regex by 
+  (*NO_START_OPT) or passed in match-time options. */


-  if ((options & PCRE_NO_START_OPTIMIZE) == 0)
+  if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0)
     {
     /* Advance to a unique first byte if there is one. */



Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcre_internal.h    2010-11-21 18:45:10 UTC (rev 576)
@@ -615,7 +615,7 @@
    PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \
    PCRE_NO_AUTO_CAPTURE|PCRE_NO_UTF8_CHECK|PCRE_AUTO_CALLOUT|PCRE_FIRSTLINE| \
    PCRE_DUPNAMES|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
-   PCRE_JAVASCRIPT_COMPAT|PCRE_UCP)
+   PCRE_JAVASCRIPT_COMPAT|PCRE_UCP|PCRE_NO_START_OPTIMIZE)


#define PUBLIC_EXEC_OPTIONS \
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \
@@ -932,15 +932,16 @@

 #define STRING_DEFINE               "DEFINE"


-#define STRING_CR_RIGHTPAR          "CR)"
-#define STRING_LF_RIGHTPAR          "LF)"
-#define STRING_CRLF_RIGHTPAR        "CRLF)"
-#define STRING_ANY_RIGHTPAR         "ANY)"
-#define STRING_ANYCRLF_RIGHTPAR     "ANYCRLF)"
-#define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)"
-#define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)"
-#define STRING_UTF8_RIGHTPAR        "UTF8)"
-#define STRING_UCP_RIGHTPAR         "UCP)"
+#define STRING_CR_RIGHTPAR             "CR)"
+#define STRING_LF_RIGHTPAR             "LF)"
+#define STRING_CRLF_RIGHTPAR           "CRLF)"
+#define STRING_ANY_RIGHTPAR            "ANY)"
+#define STRING_ANYCRLF_RIGHTPAR        "ANYCRLF)"
+#define STRING_BSR_ANYCRLF_RIGHTPAR    "BSR_ANYCRLF)"
+#define STRING_BSR_UNICODE_RIGHTPAR    "BSR_UNICODE)"
+#define STRING_UTF8_RIGHTPAR           "UTF8)"
+#define STRING_UCP_RIGHTPAR            "UCP)"
+#define STRING_NO_START_OPT_RIGHTPAR   "NO_START_OPT)"


#else /* SUPPORT_UTF8 */

@@ -1186,15 +1187,16 @@

 #define STRING_DEFINE               STR_D STR_E STR_F STR_I STR_N STR_E


-#define STRING_CR_RIGHTPAR          STR_C STR_R STR_RIGHT_PARENTHESIS
-#define STRING_LF_RIGHTPAR          STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_CRLF_RIGHTPAR        STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_ANY_RIGHTPAR         STR_A STR_N STR_Y STR_RIGHT_PARENTHESIS
-#define STRING_ANYCRLF_RIGHTPAR     STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
-#define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS
-#define STRING_UTF8_RIGHTPAR        STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS
-#define STRING_UCP_RIGHTPAR         STR_U STR_C STR_P STR_RIGHT_PARENTHESIS
+#define STRING_CR_RIGHTPAR             STR_C STR_R STR_RIGHT_PARENTHESIS
+#define STRING_LF_RIGHTPAR             STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_CRLF_RIGHTPAR           STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_ANY_RIGHTPAR            STR_A STR_N STR_Y STR_RIGHT_PARENTHESIS
+#define STRING_ANYCRLF_RIGHTPAR        STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_BSR_ANYCRLF_RIGHTPAR    STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
+#define STRING_BSR_UNICODE_RIGHTPAR    STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS
+#define STRING_UTF8_RIGHTPAR           STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS
+#define STRING_UCP_RIGHTPAR            STR_U STR_C STR_P STR_RIGHT_PARENTHESIS
+#define STRING_NO_START_OPT_RIGHTPAR   STR_N STR_O STR_UNDERSCORE STR_S STR_T STR_A STR_R STR_T STR_UNDERSCORE STR_O STR_P STR_T STR_RIGHT_PARENTHESIS


#endif /* SUPPORT_UTF8 */


Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/pcretest.c    2010-11-21 18:45:10 UTC (rev 576)
@@ -1588,6 +1588,7 @@
       case 'U': options |= PCRE_UNGREEDY; break;
       case 'W': options |= PCRE_UCP; break;
       case 'X': options |= PCRE_EXTRA; break;
+      case 'Y': options |= PCRE_NO_START_OPTIMISE; break;
       case 'Z': debug_lengths = 0; break;
       case '8': options |= PCRE_UTF8; use_utf8 = 1; break;
       case '?': options |= PCRE_NO_UTF8_CHECK; break;
@@ -1924,7 +1925,7 @@
       if (do_flip) all_options = byteflip(all_options, sizeof(all_options));


       if (get_options == 0) fprintf(outfile, "No options\n");
-        else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+        else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
           ((get_options & PCRE_ANCHORED) != 0)? " anchored" : "",
           ((get_options & PCRE_CASELESS) != 0)? " caseless" : "",
           ((get_options & PCRE_EXTENDED) != 0)? " extended" : "",
@@ -1940,6 +1941,7 @@
           ((get_options & PCRE_UTF8) != 0)? " utf8" : "",
           ((get_options & PCRE_UCP) != 0)? " ucp" : "",
           ((get_options & PCRE_NO_UTF8_CHECK) != 0)? " no_utf8_check" : "",
+          ((get_options & PCRE_NO_START_OPTIMIZE) != 0)? " no_start_optimize" : "",
           ((get_options & PCRE_DUPNAMES) != 0)? " dupnames" : "");


       if (jchanged) fprintf(outfile, "Duplicate name status changes\n");


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/testdata/testinput2    2010-11-21 18:45:10 UTC (rev 576)
@@ -2584,7 +2584,13 @@
   abc\Y
   abcxypqr  
   abcxypqr\Y  
+  
+/(*NO_START_OPT)xyz/C
+  abcxyz 


+/xyz/CY
+  abcxyz 
+
 /^"((?(?=[a])[^"])|b)*"$/C
     "ab"



Modified: code/trunk/testdata/testinput7
===================================================================
--- code/trunk/testdata/testinput7    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/testdata/testinput7    2010-11-21 18:45:10 UTC (rev 576)
@@ -4411,6 +4411,9 @@
   abc\Y
   abcxypqr  
   abcxypqr\Y  
+
+/(*NO_START_OPT)xyz/C
+  abcxyz 


/(?C)ab/
ab

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/testdata/testoutput2    2010-11-21 18:45:10 UTC (rev 576)
@@ -9294,7 +9294,31 @@
  +0        ^     x
  +0         ^    x
 No match
+  
+/(*NO_START_OPT)xyz/C
+  abcxyz 
+--->abcxyz
++15 ^          x
++15  ^         x
++15   ^        x
++15    ^       x
++16    ^^      y
++17    ^ ^     z
++18    ^  ^    
+ 0: xyz


+/xyz/CY
+  abcxyz 
+--->abcxyz
+ +0 ^          x
+ +0  ^         x
+ +0   ^        x
+ +0    ^       x
+ +1    ^^      y
+ +2    ^ ^     z
+ +3    ^  ^    
+ 0: xyz
+
 /^"((?(?=[a])[^"])|b)*"$/C
     "ab"
 --->"ab"


Modified: code/trunk/testdata/testoutput7
===================================================================
--- code/trunk/testdata/testoutput7    2010-11-21 12:55:42 UTC (rev 575)
+++ code/trunk/testdata/testoutput7    2010-11-21 18:45:10 UTC (rev 576)
@@ -7319,6 +7319,18 @@
  +0        ^     x
  +0         ^    x
 No match
+
+/(*NO_START_OPT)xyz/C
+  abcxyz 
+--->abcxyz
++15 ^          x
++15  ^         x
++15   ^        x
++15    ^       x
++16    ^^      y
++17    ^ ^     z
++18    ^  ^    
+ 0: xyz


/(?C)ab/
ab