[Pcre-svn] [1110] code/trunk: Fix minimum length bug for pat…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1110] code/trunk: Fix minimum length bug for patterns containing (*ACCEPT ).
Revision: 1110
          http://www.exim.org/viewvc/pcre2?view=rev&revision=1110
Author:   ph10
Date:     2019-06-18 17:07:43 +0100 (Tue, 18 Jun 2019)
Log Message:
-----------
Fix minimum length bug for patterns containing (*ACCEPT).


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/src/pcre2_compile.c
    code/trunk/src/pcre2_internal.h
    code/trunk/src/pcre2_study.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2019-06-18 08:29:43 UTC (rev 1109)
+++ code/trunk/ChangeLog    2019-06-18 16:07:43 UTC (rev 1110)
@@ -31,8 +31,14 @@
 9. Some changes to the way the minimum subject length is handled:


    * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed; 
-     pcre2test omits this item instead of showing a value of zero.
+     pcre2test now omits this item instead of showing a value of zero.


+   * An incorrect minimum length could be calculated for a pattern that 
+     contained (*ACCEPT) inside a qualified group whose minimum repetition was 
+     zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum
+     of 2. The minimum length scan no longer happens for a pattern that 
+     contains (*ACCEPT).
+     
    * When no minimum length is set by the normal scan, but a first and/or last 
      code unit is recorded, set the minimum to 1 or 2 as appropriate.



Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c    2019-06-18 08:29:43 UTC (rev 1109)
+++ code/trunk/src/pcre2_compile.c    2019-06-18 16:07:43 UTC (rev 1110)
@@ -10039,8 +10039,9 @@


 if (cb.had_accept)
   {
-  reqcu = 0;              /* Must disable after (*ACCEPT) */
+  reqcu = 0;                     /* Must disable after (*ACCEPT) */
   reqcuflags = REQ_NONE;
+  re->flags |= PCRE2_HASACCEPT;  /* Disables minimum length */ 
   }


/* Fill in the final opcode and check for disastrous overflow. If no overflow,

Modified: code/trunk/src/pcre2_internal.h
===================================================================
--- code/trunk/src/pcre2_internal.h    2019-06-18 08:29:43 UTC (rev 1109)
+++ code/trunk/src/pcre2_internal.h    2019-06-18 16:07:43 UTC (rev 1110)
@@ -517,6 +517,7 @@
 #define PCRE2_HASBKPORX     0x00100000  /* contains \P, \p, or \X */
 #define PCRE2_DUPCAPUSED    0x00200000  /* contains (?| */
 #define PCRE2_HASBKC        0x00400000  /* contains \C */
+#define PCRE2_HASACCEPT     0x00800000  /* contains (*ACCEPT) */


 #define PCRE2_MODE_MASK     (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)



Modified: code/trunk/src/pcre2_study.c
===================================================================
--- code/trunk/src/pcre2_study.c    2019-06-18 08:29:43 UTC (rev 1109)
+++ code/trunk/src/pcre2_study.c    2019-06-18 16:07:43 UTC (rev 1110)
@@ -1607,13 +1607,13 @@
   }


/* Find the minimum length of subject string. If the pattern can match an empty
-string, the minimum length is already known. If there are more back references
-than the size of the vector we are going to cache them in, do nothing. A
-pattern that complicated will probably take a long time to analyze and may in
-any case turn out to be too complicated. Note that back reference minima are
-held as 16-bit numbers. */
+string, the minimum length is already known. If the pattern contains (*ACCEPT)
+all bets are off. If there are more back references than the size of the vector
+we are going to cache them in, do nothing. A pattern that complicated will
+probably take a long time to analyze and may in any case turn out to be too
+complicated. Note that back reference minima are held as 16-bit numbers. */

-if ((re->flags & PCRE2_MATCH_EMPTY) == 0 &&
+if ((re->flags & (PCRE2_MATCH_EMPTY|PCRE2_HASACCEPT)) == 0 &&
      re->top_backref <= MAX_CACHE_BACKREF)
   {
   int backref_cache[MAX_CACHE_BACKREF+1];


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2019-06-18 08:29:43 UTC (rev 1109)
+++ code/trunk/testdata/testinput2    2019-06-18 16:07:43 UTC (rev 1110)
@@ -5623,4 +5623,6 @@


/((?=a))[abcd]/I

+/A(?:(*ACCEPT))?B/info
+
# End of testinput2

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2019-06-18 08:29:43 UTC (rev 1109)
+++ code/trunk/testdata/testoutput2    2019-06-18 16:07:43 UTC (rev 1110)
@@ -17026,6 +17026,11 @@
 First code unit = 'a'
 Subject length lower bound = 1


+/A(?:(*ACCEPT))?B/info
+Capture group count = 0
+First code unit = 'A'
+Subject length lower bound = 1
+
# End of testinput2
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data