[Pcre-svn] [1022] code/trunk: Add support for PCRE_STUDY_EXT…

Startseite
Nachricht löschen
Autor: Subversion repository
Datum:  
To: pcre-svn
Betreff: [Pcre-svn] [1022] code/trunk: Add support for PCRE_STUDY_EXTRA_NEEDED.
Revision: 1022
          http://vcs.pcre.org/viewvc?view=rev&revision=1022
Author:   ph10
Date:     2012-08-28 13:28:15 +0100 (Tue, 28 Aug 2012)


Log Message:
-----------
Add support for PCRE_STUDY_EXTRA_NEEDED.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcretest.1
    code/trunk/pcre.h.in
    code/trunk/pcre_internal.h
    code/trunk/pcre_study.c
    code/trunk/pcretest.c
    code/trunk/testdata/testinput12
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput12
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/ChangeLog    2012-08-28 12:28:15 UTC (rev 1022)
@@ -55,6 +55,8 @@


 11. Patch by Daniel Richard G to the autoconf files to add a macro for sorting
     out POSIX threads when JIT support is configured. 
+    
+12. Added support for PCRE_STUDY_EXTRA_NEEDED. 





Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/doc/pcreapi.3    2012-08-28 12:28:15 UTC (rev 1022)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "04 May 2012" "PCRE 8.31"
+.TH PCREAPI 3 "28 August 2012" "PCRE 8.32"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -960,12 +960,16 @@
 in the section on matching a pattern.
 .P
 If studying the pattern does not produce any useful information,
-\fBpcre_study()\fP returns NULL. In that circumstance, if the calling program
-wants to pass any of the other fields to \fBpcre_exec()\fP or
-\fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block.
+\fBpcre_study()\fP returns NULL by default. In that circumstance, if the
+calling program wants to pass any of the other fields to \fBpcre_exec()\fP or
+\fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block. However, 
+if \fBpcre_study()\fP is called with the PCRE_STUDY_EXTRA_NEEDED option, it
+returns a \fBpcre_extra\fP block even if studying did not find any additional
+information. It may still return NULL, however, if an error occurs in
+\fBpcre_study()\fP.
 .P
 The second argument of \fBpcre_study()\fP contains option bits. There are three
-options:
+further options in addition to PCRE_STUDY_EXTRA_NEEDED:
 .sp
   PCRE_STUDY_JIT_COMPILE
   PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
@@ -974,7 +978,7 @@
 If any of these are set, and the just-in-time compiler is available, the
 pattern is further compiled into machine code that executes much faster than
 the \fBpcre_exec()\fP interpretive matching function. If the just-in-time
-compiler is not available, these options are ignored. All other bits in the
+compiler is not available, these options are ignored. All undefined bits in the
 \fIoptions\fP argument must be zero.
 .P
 JIT compilation is a heavyweight optimization. It can take some time for
@@ -1022,10 +1026,9 @@
 Studying a pattern does two things: first, a lower bound for the length of
 subject string that is needed to match the pattern is computed. This does not
 mean that there are any strings of that length that match, but it does
-guarantee that no shorter strings match. The value is used by
-\fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP to avoid wasting time by trying to
-match strings that are shorter than the lower bound. You can find out the value
-in a calling program via the \fBpcre_fullinfo()\fP function.
+guarantee that no shorter strings match. The value is used to avoid wasting
+time by trying to match strings that are shorter than the lower bound. You can
+find out the value in a calling program via the \fBpcre_fullinfo()\fP function.
 .P
 Studying a pattern is also useful for non-anchored patterns that do not have a
 single fixed starting character. A bitmap of possible starting bytes is
@@ -2667,6 +2670,6 @@
 .rs
 .sp
 .nf
-Last updated: 17 June 2012
+Last updated: 28 August 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/doc/pcretest.1    2012-08-28 12:28:15 UTC (rev 1022)
@@ -1,4 +1,4 @@
-.TH PCRETEST 1 "21 February 2012" "PCRE 8.31"
+.TH PCRETEST 1 "28 August 2012" "PCRE 8.32"
 .SH NAME
 pcretest - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -22,10 +22,19 @@
 .\" HREF
 \fBpcre16\fP
 .\"
-documentation. The input for \fBpcretest\fP is a sequence of regular expression
-patterns and strings to be matched, as described below. The output shows the
-result of each match. Options on the command line and the patterns control PCRE
-options and exactly what is output.
+documentation. 
+.P
+The input for \fBpcretest\fP is a sequence of regular expression patterns and
+strings to be matched, as described below. The output shows the result of each
+match. Options on the command line and the patterns control PCRE options and
+exactly what is output.
+.P
+As PCRE has evolved, it has acquired many different features, and as a result,
+\fBpcretest\fP now has rather a lot of obscure options for testing every
+possible feature. Some of these options are specifically designed for use in
+conjunction with the test script and data files that are distributed as part of
+PCRE, and are unlikely to be of use otherwise. They are all documented here, 
+but without much justification.
 .
 .
 .SH "PCRE's 8-BIT and 16-BIT LIBRARIES"
@@ -145,7 +154,10 @@
 If \fB-s++\fP is used instead of \fB-s+\fP (with or without a following digit),
 the text "(JIT)" is added to the first output line after a match or no match
 when JIT-compiled code was actually used.
-.P
+.sp
+Note that there are pattern options that can override \fB-s\fP, either 
+specifying no studying at all, or suppressing JIT compilation.
+.sp
 If the \fB/I\fP or \fB/D\fP option is present on a pattern (requesting output
 about the compiled pattern), information about the result of studying is not
 included when studying is caused only by \fB-s\fP and neither \fB-i\fP nor
@@ -391,15 +403,22 @@
 successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
 JIT compiled code is also output.
 .P
-If the \fB/S\fP modifier appears once, it causes \fBpcre[16]_study()\fP to be
-called after the expression has been compiled, and the results used when the
-expression is matched. If \fB/S\fP appears twice, it suppresses studying, even
+The \fB/S\fP modifier causes \fBpcre[16]_study()\fP to be called after the
+expression has been compiled, and the results used when the expression is
+matched. There are a number of qualifying characters that may follow \fB/S\fP. 
+They may appear in any order.
+.P
+If \fBS\fP is followed by an exclamation mark, \fBpcre[16]_study()\fP is called 
+with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a 
+\fBpcre_extra\fP block, even when studying discovers no useful information.
+.P
+If \fB/S\fP is followed by a second S character, it suppresses studying, even
 if it was requested externally by the \fB-s\fP command line option. This makes
 it possible to specify that certain patterns are always studied, and others are
 never studied, independently of \fB-s\fP. This feature is used in the test
 files in a few cases where the output is different when the pattern is studied.
 .P
-If the \fB/S\fP modifier is immediately followed by a + character, the call to
+If the \fB/S\fP modifier is followed by a + character, the call to
 \fBpcre[16]_study()\fP is made with all the JIT study options, requesting
 just-in-time optimization support if it is available, for both normal and
 partial matching. If you want to restrict the JIT compiling modes, you can
@@ -428,6 +447,11 @@
 documentation. See also the \fB\eJ\fP escape sequence below for a way of
 setting the size of the JIT stack.
 .P
+Finally, if \fB/S\fP is followed by a minus character, JIT compilation is
+suppressed, even if it was requested externally by the \fB-s\fP command line
+option. This makes it possible to specify that JIT is never to be used for
+certain patterns.
+.P
 The \fB/T\fP modifier must be followed by a single digit. It causes a specific
 set of built-in character tables to be passed to \fBpcre[16]_compile()\fP. It
 is used in the standard PCRE tests to check behaviour with different character
@@ -966,6 +990,6 @@
 .rs
 .sp
 .nf
-Last updated: 21 February 2012
+Last updated: 28 August 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi


Modified: code/trunk/pcre.h.in
===================================================================
--- code/trunk/pcre.h.in    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/pcre.h.in    2012-08-28 12:28:15 UTC (rev 1022)
@@ -259,6 +259,7 @@
 #define PCRE_STUDY_JIT_COMPILE                0x0001
 #define PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE   0x0002
 #define PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE   0x0004
+#define PCRE_STUDY_EXTRA_NEEDED               0x0008


/* Bit flags for the pcre[16]_extra structure. Do not re-arrange or redefine
these bits, just add new ones on the end, in order to remain compatible. */

Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/pcre_internal.h    2012-08-28 12:28:15 UTC (rev 1022)
@@ -893,7 +893,7 @@


 #define PUBLIC_STUDY_OPTIONS \
    (PCRE_STUDY_JIT_COMPILE|PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE| \
-    PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE)
+    PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE|PCRE_STUDY_EXTRA_NEEDED)


/* Magic number to provide a small check against being handed junk. */


Modified: code/trunk/pcre_study.c
===================================================================
--- code/trunk/pcre_study.c    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/pcre_study.c    2012-08-28 12:28:15 UTC (rev 1022)
@@ -1408,20 +1408,20 @@
   }


/* If a set of starting bytes has been identified, or if the minimum length is
-greater than zero, or if JIT optimization has been requested, get a
-pcre[16]_extra block and a pcre_study_data block. The study data is put in the
-latter, which is pointed to by the former, which may also get additional data
-set later by the calling program. At the moment, the size of pcre_study_data
-is fixed. We nevertheless save it in a field for returning via the
-pcre_fullinfo() function so that if it becomes variable in the future,
-we don't have to change that code. */
+greater than zero, or if JIT optimization has been requested, or if
+PCRE_STUDY_EXTRA_NEEDED is set, get a pcre[16]_extra block and a
+pcre_study_data block. The study data is put in the latter, which is pointed to
+by the former, which may also get additional data set later by the calling
+program. At the moment, the size of pcre_study_data is fixed. We nevertheless
+save it in a field for returning via the pcre_fullinfo() function so that if it
+becomes variable in the future, we don't have to change that code. */

-if (bits_set || min > 0
+if (bits_set || min > 0 || (options & (
 #ifdef SUPPORT_JIT
-    || (options & (PCRE_STUDY_JIT_COMPILE | PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
-                 | PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE)) != 0
+    PCRE_STUDY_JIT_COMPILE | PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE |
+    PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE | 
 #endif
-  )
+    PCRE_STUDY_EXTRA_NEEDED)) != 0)
   {
   extra = (PUBL(extra) *)(PUBL(malloc))
     (sizeof(PUBL(extra)) + sizeof(pcre_study_data));
@@ -1475,7 +1475,8 @@


/* If JIT support was compiled and requested, attempt the JIT compilation.
If no starting bytes were found, and the minimum length is zero, and JIT
- compilation fails, abandon the extra block and return NULL. */
+ compilation fails, abandon the extra block and return NULL, unless
+ PCRE_STUDY_EXTRA_NEEDED is set. */

 #ifdef SUPPORT_JIT
   extra->executable_jit = NULL;
@@ -1486,7 +1487,8 @@
   if ((options & PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE) != 0)
     PRIV(jit_compile)(re, extra, JIT_PARTIAL_HARD_COMPILE);


-  if (study->flags == 0 && (extra->flags & PCRE_EXTRA_EXECUTABLE_JIT) == 0)
+  if (study->flags == 0 && (extra->flags & PCRE_EXTRA_EXECUTABLE_JIT) == 0 &&
+      (options & PCRE_STUDY_EXTRA_NEEDED) == 0)
     {
 #ifdef COMPILE_PCRE8
     pcre_free_study(extra);


Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/pcretest.c    2012-08-28 12:28:15 UTC (rev 1022)
@@ -704,6 +704,9 @@
     PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
 };


+#define PCRE_STUDY_ALLJIT (PCRE_STUDY_JIT_COMPILE | \
+ PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE | PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE)
+
/* Textual explanations for runtime error codes */

static const char *errtexts[] = {
@@ -2796,7 +2799,7 @@
/* Look for options after final delimiter */

options = 0;
- study_options = 0;
+ study_options = force_study_options;
log_store = showstore; /* default from command line */

while (*pp != 0)
@@ -2833,12 +2836,22 @@
#endif

       case 'S':
-      if (do_study == 0)
+      do_study = 1;
+      for (;;)
         {
-        do_study = 1;
-        if (*pp == '+')
+        switch (*pp++)
           {
-          if (*(++pp) == '+')
+          case 'S':
+          do_study = 0;
+          no_force_study = 1;
+          break;
+
+          case '!':
+          study_options |= PCRE_STUDY_EXTRA_NEEDED;
+          break;
+
+          case '+':
+          if (*pp == '+')
             {
             verify_jit = TRUE;
             pp++;
@@ -2847,13 +2860,18 @@
             study_options |= jit_study_bits[*pp++ - '1'];
           else
             study_options |= jit_study_bits[6];
+          break;
+
+          case '-':
+          study_options &= ~PCRE_STUDY_ALLJIT;
+          break;
+
+          default:
+          pp--;
+          goto ENDLOOP;
           }
         }
-      else
-        {
-        do_study = 0;
-        no_force_study = 1;
-        }
+      ENDLOOP:
       break;


       case 'U': options |= PCRE_UNGREEDY; break;
@@ -3083,7 +3101,7 @@
         clock_t start_time = clock();
         for (i = 0; i < timeit; i++)
           {
-          PCRE_STUDY(extra, re, study_options | force_study_options, &error);
+          PCRE_STUDY(extra, re, study_options, &error);
           }
         time_taken = clock() - start_time;
         if (extra != NULL)
@@ -3094,7 +3112,7 @@
           (((double)time_taken * 1000.0) / (double)timeit) /
             (double)CLOCKS_PER_SEC);
         }
-      PCRE_STUDY(extra, re, study_options | force_study_options, &error);
+      PCRE_STUDY(extra, re, study_options, &error);
       if (error != NULL)
         fprintf(outfile, "Failed to study: %s\n", error);
       else if (extra != NULL)
@@ -3354,7 +3372,8 @@


         /* Show this only if the JIT was set by /S, not by -s. */


-        if ((study_options & PCRE_STUDY_JIT_COMPILE) != 0)
+        if ((study_options & PCRE_STUDY_ALLJIT) != 0 &&
+            (force_study_options & PCRE_STUDY_ALLJIT) == 0)
           {
           int jit;
           if (new_info(re, extra, PCRE_INFO_JIT, &jit) == 0)


Modified: code/trunk/testdata/testinput12
===================================================================
--- code/trunk/testdata/testinput12    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/testdata/testinput12    2012-08-28 12:28:15 UTC (rev 1022)
@@ -76,8 +76,14 @@
     ab\P
     ab\P\P
     xyz
+    
+/abcd/S++2I 


 /(*NO_START_OPT)a(*:m)b/KS++
     a


+/.?(*THEN)/S+I
+
+/.?(*THEN)/S!+I
+
/-- End of testinput12 --/

Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/testdata/testinput2    2012-08-28 12:28:15 UTC (rev 1022)
@@ -3770,7 +3770,7 @@
     ac 


/-- These are all run as real matches in test 1; here we are just checking the
-settings of the anchored and startline bits. */
+settings of the anchored and startline bits. --/

/(?>.*?a)(?<=ba)/I

@@ -3804,4 +3804,10 @@

/(?:.*abc)/mI

+/-- Check PCRE_STUDY_EXTRA_NEEDED --/
+
+/.?/S-I
+
+/.?/S!I
+
/-- End of testinput2 --/

Modified: code/trunk/testdata/testoutput12
===================================================================
--- code/trunk/testdata/testoutput12    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/testdata/testoutput12    2012-08-28 12:28:15 UTC (rev 1022)
@@ -147,9 +147,35 @@
 Partial match: ab (JIT)
     xyz
 No match (JIT)
+    
+/abcd/S++2I 
+Capturing subpattern count = 0
+No options
+First char = 'a'
+Need char = 'd'
+Subject length lower bound = 4
+No set of starting bytes
+JIT study was successful


 /(*NO_START_OPT)a(*:m)b/KS++
     a
 No match, mark = m (JIT)


+/.?(*THEN)/S+I
+Capturing subpattern count = 0
+No options
+No first char
+No need char
+Study returned NULL
+JIT study was not successful
+
+/.?(*THEN)/S!+I
+Capturing subpattern count = 0
+No options
+No first char
+No need char
+Subject length lower bound = -1
+No set of starting bytes
+JIT study was not successful
+
/-- End of testinput12 --/

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2012-08-27 15:49:23 UTC (rev 1021)
+++ code/trunk/testdata/testoutput2    2012-08-28 12:28:15 UTC (rev 1022)
@@ -12362,7 +12362,7 @@
  0: ac


/-- These are all run as real matches in test 1; here we are just checking the
-settings of the anchored and startline bits. */
+settings of the anchored and startline bits. --/

/(?>.*?a)(?<=ba)/I
Capturing subpattern count = 0
@@ -12464,4 +12464,21 @@
First char at start or follows newline
Need char = 'c'

+/-- Check PCRE_STUDY_EXTRA_NEEDED --/
+
+/.?/S-I
+Capturing subpattern count = 0
+No options
+No first char
+No need char
+Study returned NULL
+
+/.?/S!I
+Capturing subpattern count = 0
+No options
+No first char
+No need char
+Subject length lower bound = -1
+No set of starting bytes
+
/-- End of testinput2 --/