[Pcre-svn] [1123] code/trunk: Fix partial matching bug in p…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1123] code/trunk: Fix partial matching bug in pcre2_dfa_match().
Revision: 1123
          http://www.exim.org/viewvc/pcre2?view=rev&revision=1123
Author:   ph10
Date:     2019-06-26 17:13:28 +0100 (Wed, 26 Jun 2019)
Log Message:
-----------
Fix partial matching bug in pcre2_dfa_match().


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/src/pcre2_dfa_match.c
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput6


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2019-06-26 08:23:47 UTC (rev 1122)
+++ code/trunk/ChangeLog    2019-06-26 16:13:28 UTC (rev 1123)
@@ -5,7 +5,7 @@
 Version 10.34 22-April-2019
 ---------------------------


-1. The maximum number of capturing subpatterns is 65535 (documented), but no
+1. The maximum number of capturing subpatterns is 65535 (documented), but no
check on this was ever implemented. This omission has been rectified; it fixes
ClusterFuzz 14376.

@@ -25,40 +25,40 @@
7. Adjust the limit for "must have" code unit searching, in particular,
increase it substantially for non-anchored patterns.

-8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
+8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
minimum is potentially useful.

9. Some changes to the way the minimum subject length is handled:

-   * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed; 
+   * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed;
      pcre2test now omits this item instead of showing a value of zero.
-     
-   * An incorrect minimum length could be calculated for a pattern that 
-     contained (*ACCEPT) inside a qualified group whose minimum repetition was 
+
+   * An incorrect minimum length could be calculated for a pattern that
+     contained (*ACCEPT) inside a qualified group whose minimum repetition was
      zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum
-     of 2. The minimum length scan no longer happens for a pattern that 
+     of 2. The minimum length scan no longer happens for a pattern that
      contains (*ACCEPT).
-     
-   * When no minimum length is set by the normal scan, but a first and/or last 
+
+   * When no minimum length is set by the normal scan, but a first and/or last
      code unit is recorded, set the minimum to 1 or 2 as appropriate.
-     
+
    * When a pattern contains multiple groups with the same number, a back
      reference cannot know which one to scan for a minimum length. This used to
      cause the minimum length finder to give up with no result. Now it treats
-     such references as not adding to the minimum length (which it should have 
+     such references as not adding to the minimum length (which it should have
      done all along).
-     
-   * Furthermore, the above action now happens only if the back reference is to 
-     a group that exists more than once in a pattern instead of any back 
-     reference in a pattern with duplicate numbers.  
-     
-10. A (*MARK) value inside a successful condition was not being returned by the 
+
+   * Furthermore, the above action now happens only if the back reference is to
+     a group that exists more than once in a pattern instead of any back
+     reference in a pattern with duplicate numbers.
+
+10. A (*MARK) value inside a successful condition was not being returned by the
 interpretive matcher (it was returned by JIT). This bug has been mended.


-11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work
-if the pattern had more than 32 capturing parentheses. This is fixed. In
-addition (a) the default limit for groups requested by -o<n> has been raised to
-50, (b) the new --om-capture option changes the limit, (c) an error is raised
+11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work
+if the pattern had more than 32 capturing parentheses. This is fixed. In
+addition (a) the default limit for groups requested by -o<n> has been raised to
+50, (b) the new --om-capture option changes the limit, (c) an error is raised
if -o asks for a group that is above the limit.

12. The quantifier {1} was always being ignored, but this is incorrect when it
@@ -66,13 +66,13 @@
parenthesized item may contain multiple branches or other backtracking points,
for example /(a|ab){1}+c/ or /(a+){1}+a/.

-13. Nested lookbehinds are now taken into account when computing the maximum
-lookbehind value. For example /(?<=a(?<=ba)c)/ previously set a maximum
-lookbehind of 2, because that is the largest individual lookbehind. Now it sets
+13. Nested lookbehinds are now taken into account when computing the maximum
+lookbehind value. For example /(?<=a(?<=ba)c)/ previously set a maximum
+lookbehind of 2, because that is the largest individual lookbehind. Now it sets
it to 3, because matching looks back 3 characters.

-14. For partial matches, pcre2test was always showing the maximum lookbehind
-characters, flagged with "<", which is misleading when the lookbehind didn't
+14. For partial matches, pcre2test was always showing the maximum lookbehind
+characters, flagged with "<", which is misleading when the lookbehind didn't
actually look behind the start (because it was later in the pattern). Showing
all consulted preceding characters for partial matches is now controlled by the
existing "allusedtext" modifier and, as for complete matches, this facility is
@@ -79,7 +79,11 @@
available only for non-JIT matching, because JIT does not maintain the first
and last consulted characters.

+15. DFA matching (using pcre2_dfa_match()) was not recognising a partial match
+if the end of the subject was encountered in a lookahead (conditional or
+otherwise), an atomic group, or a recursion.

+
Version 10.33 16-April-2019
---------------------------


Modified: code/trunk/src/pcre2_dfa_match.c
===================================================================
--- code/trunk/src/pcre2_dfa_match.c    2019-06-26 08:23:47 UTC (rev 1122)
+++ code/trunk/src/pcre2_dfa_match.c    2019-06-26 16:13:28 UTC (rev 1123)
@@ -3152,8 +3152,8 @@


/* We have finished the processing at the current subject character. If no
new states have been set for the next character, we have found all the
- matches that we are going to find. If we are at the top level and partial
- matching has been requested, check for appropriate conditions.
+ matches that we are going to find. If partial matching has been requested,
+ check for appropriate conditions.

The "forced_ fail" variable counts the number of (*F) encountered for the
character. If it is equal to the original active_count (saved in
@@ -3165,8 +3165,7 @@

   if (new_count <= 0)
     {
-    if (rlevel == 1 &&                               /* Top level, and */
-        could_continue &&                            /* Some could go on, and */
+    if (could_continue &&                            /* Some could go on, and */
         forced_fail != workspace[1] &&               /* Not all forced fail & */
         (                                            /* either... */
         (mb->moptions & PCRE2_PARTIAL_HARD) != 0      /* Hard partial */
@@ -3175,8 +3174,8 @@
          match_count < 0)                            /* no matches */
         ) &&                                         /* And... */
         (
-        partial_newline ||                           /* Either partial NL */
-          (                                          /* or ... */
+        partial_newline ||                     /* Either partial NL */
+          (                                    /* or ... */
           ptr >= end_subject &&                /* End of subject and */
           ptr > mb->start_used_ptr)            /* Inspected non-empty string */
           )


Modified: code/trunk/testdata/testinput6
===================================================================
--- code/trunk/testdata/testinput6    2019-06-26 08:23:47 UTC (rev 1122)
+++ code/trunk/testdata/testinput6    2019-06-26 16:13:28 UTC (rev 1123)
@@ -4972,4 +4972,26 @@
 \= Expect no match
     0


+/(?<=pqr)abc(?=xyz)/
+    123pqrabcxy\=ps,allusedtext
+    123pqrabcxyz\=ps,allusedtext
+
+/(?>a+b)/
+    aaaa\=ps
+    aaaab\=ps
+    
+/(abc)(?1)/
+    abca\=ps
+    abcabc\=ps
+
+/(?(?=abc).*|Z)/
+    ab\=ps
+    abcxyz\=ps
+
+/(abc)++x/
+    abcab\=ps
+    abc\=ps 
+    ab\=ps
+    abcx  
+
 # End of testinput6


Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6    2019-06-26 08:23:47 UTC (rev 1122)
+++ code/trunk/testdata/testoutput6    2019-06-26 16:13:28 UTC (rev 1123)
@@ -7809,4 +7809,40 @@
     0
 No match


+/(?<=pqr)abc(?=xyz)/
+    123pqrabcxy\=ps,allusedtext
+Partial match: pqrabcxy
+               <<<
+    123pqrabcxyz\=ps,allusedtext
+ 0: pqrabcxyz
+    <<<   >>>
+
+/(?>a+b)/
+    aaaa\=ps
+Partial match: aaaa
+    aaaab\=ps
+ 0: aaaab
+    
+/(abc)(?1)/
+    abca\=ps
+Partial match: abca
+    abcabc\=ps
+ 0: abcabc
+
+/(?(?=abc).*|Z)/
+    ab\=ps
+Partial match: ab
+    abcxyz\=ps
+ 0: abcxyz
+
+/(abc)++x/
+    abcab\=ps
+Partial match: abcab
+    abc\=ps 
+Partial match: abc
+    ab\=ps
+Partial match: ab
+    abcx  
+ 0: abcx
+
 # End of testinput6