[Pcre-svn] [1302] code/trunk: Further changes to backtrackin…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1302] code/trunk: Further changes to backtracking verbs in assertions.
Revision: 1302
          http://vcs.pcre.org/viewvc?view=rev&revision=1302
Author:   ph10
Date:     2013-03-27 11:13:36 +0000 (Wed, 27 Mar 2013)


Log Message:
-----------
Further changes to backtracking verbs in assertions.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcrepattern.3
    code/trunk/pcre_exec.c
    code/trunk/testdata/testinput1
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput1
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2013-03-26 16:36:40 UTC (rev 1301)
+++ code/trunk/ChangeLog    2013-03-27 11:13:36 UTC (rev 1302)
@@ -110,7 +110,7 @@


30. Update RunTest with additional test selector options.

-31. The way PCRE handles backtracking verbs has been changed in to ways.
+31. The way PCRE handles backtracking verbs has been changed in two ways.

     (1) Previously, in something like (*COMMIT)(*SKIP), COMMIT would override
     SKIP. Now, PCRE acts on whichever backtracking verb is reached first by
@@ -118,8 +118,8 @@
     rather obscure rules do not always do the same thing.


     (2) Previously, backtracking verbs were confined within assertions. This is 
-    no longer the case, except for (*ACCEPT). Again, this sometimes improves
-    Perl compatibility, and sometimes does not.
+    no longer the case for positive assertions, except for (*ACCEPT). Again,
+    this sometimes improves Perl compatibility, and sometimes does not.


 32. A number of tests that were in test 2 because Perl did things differently 
     have been moved to test 1, because either Perl or PCRE has changed, and 


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2013-03-26 16:36:40 UTC (rev 1301)
+++ code/trunk/doc/pcrepattern.3    2013-03-27 11:13:36 UTC (rev 1302)
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "22 March 2013" "PCRE 8.33"
+.TH PCREPATTERN 3 "27 March 2013" "PCRE 8.33"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -2673,13 +2673,13 @@
 .P
 The new verbs make use of what was previously invalid syntax: an opening
 parenthesis followed by an asterisk. They are generally of the form
-(*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
-depending on whether or not a name is present. A name is any sequence of
-characters that does not include a closing parenthesis. The maximum length of
-name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit libraries.
-If the name is empty, that is, if the closing parenthesis immediately follows
-the colon, the effect is as if the colon were not there. Any number of these
-verbs may occur in a pattern.
+(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving 
+differently depending on whether or not a name is present. A name is any
+sequence of characters that does not include a closing parenthesis. The maximum
+length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit
+libraries. If the name is empty, that is, if the closing parenthesis
+immediately follows the colon, the effect is as if the colon were not there.
+Any number of these verbs may occur in a pattern.
 .P
 Since these verbs are specifically related to backtracking, most of them can be
 used only when the pattern is to be matched using one of the traditional
@@ -2807,9 +2807,9 @@
 of obtaining this information than putting each alternative in its own
 capturing parentheses.
 .P
-If a verb with a name is encountered in a positive assertion, its name is
-recorded and passed back if it is the last-encountered. This does not happen
-for negative assertions.
+If a verb with a name is encountered in a positive assertion that is true, the
+name is recorded and passed back if it is the last-encountered. This does not
+happen for negative assertions or failing positive assertions.
 .P
 After a partial match or a failed match, the last encountered name in the
 entire match process is returned. For example:
@@ -2839,14 +2839,16 @@
 with what follows, but if there is no subsequent match, causing a backtrack to
 the verb, a failure is forced. That is, backtracking cannot pass to the left of
 the verb. However, when one of these verbs appears inside an atomic group or an 
-assertion, its effect is confined to that group, because once the group has
-been matched, there is never any backtracking into it. In this situation,
-backtracking can "jump back" to the left of the entire atomic group or 
-assertion. (Remember also, as stated above, that this localization also applies
-in subroutine calls.)
+assertion that is true, its effect is confined to that group, because once the
+group has been matched, there is never any backtracking into it. In this
+situation, backtracking can "jump back" to the left of the entire atomic group
+or assertion. (Remember also, as stated above, that this localization also
+applies in subroutine calls.)
 .P
 These verbs differ in exactly what kind of failure occurs when backtracking
-reaches them.
+reaches them. The behaviour described below is what happens when the verb is
+not in a subroutine or an assertion. Subsequent sections cover these special 
+cases.
 .sp
   (*COMMIT)
 .sp
@@ -2942,8 +2944,10 @@
 .sp
 If the COND1 pattern matches, FOO is tried (and possibly further items after
 the end of the group if FOO succeeds); on failure, the matcher skips to the
-second alternative and tries COND2, without backtracking into COND1.
-If (*THEN) is not inside an alternation, it acts like (*PRUNE).
+second alternative and tries COND2, without backtracking into COND1. If that
+succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no
+more alternatives, so there is a backtrack to whatever came before the entire
+group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
 .P
 The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). 
 It is like (*MARK:NAME) in that the name is remembered for passing back to the
@@ -3039,10 +3043,18 @@
 further processing. In a negative assertion, (*ACCEPT) causes the assertion to 
 fail without any further processing.
 .P
-The other backtracking verbs are not treated specially if they appear in an
-assertion. In particular, (*THEN) skips to the next alternative in the
+The other backtracking verbs are not treated specially if they appear in a
+positive assertion. In particular, (*THEN) skips to the next alternative in the
 innermost enclosing group that has alternations, whether or not this is within
 the assertion.
+.P
+Negative assertions are, however, different, in order to ensure that changing a
+positive assertion into a negative assertion changes its result. Backtracking
+into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true, 
+without considering any further alternative branches in the assertion. 
+Backtracking into (*THEN) causes it to skip to the next enclosing alternative
+within the assertion (the normal behaviour), but if the assertion does not have 
+such an alternative, (*THEN) behaves like (*PRUNE).
 .
 .
 .\" HTML <a name="btsub"></a>
@@ -3088,6 +3100,6 @@
 .rs
 .sp
 .nf
-Last updated: 22 March 2013
+Last updated: 27 March 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi


Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c    2013-03-26 16:36:40 UTC (rev 1301)
+++ code/trunk/pcre_exec.c    2013-03-27 11:13:36 UTC (rev 1302)
@@ -1608,11 +1608,18 @@
     do
       {
       RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, NULL, RM4);
+      
+      /* A match means that the assertion is true; break out of the loop
+      that matches its alternatives. */
+        
       if (rrc == MATCH_MATCH || rrc == MATCH_ACCEPT)
         {
         mstart = md->start_match_ptr;   /* In case \K reset it */
         break;
         }
+        
+      /* If not matched, restore the previous mark setting. */
+ 
       md->mark = save_mark;


       /* See comment in the code for capturing groups above about handling
@@ -1626,17 +1633,19 @@
           rrc = MATCH_NOMATCH;
         }


-      /* Anything other than NOMATCH causes the assertion to fail. This 
-      includes COMMIT, SKIP, and PRUNE. However, this consistent approach does 
-      not always have exactly the same effect as in Perl. */
+      /* Anything other than NOMATCH causes the entire assertion to fail,
+      passing back the return code. This includes COMMIT, SKIP, PRUNE and an
+      uncaptured THEN, which means they take their normal effect. This
+      consistent approach does not always have exactly the same effect as in
+      Perl. */


       if (rrc != MATCH_NOMATCH) RRETURN(rrc);
       ecode += GET(ecode, 1);
       }
-    while (*ecode == OP_ALT);
+    while (*ecode == OP_ALT);   /* Continue for next alternative */


     /* If we have tried all the alternative branches, the assertion has
-    failed. */ 
+    failed. If not, we broke out after a match. */ 


     if (*ecode == OP_KET) RRETURN(MATCH_NOMATCH);


@@ -1670,35 +1679,57 @@
     do
       {
       RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, NULL, RM5);
-      md->mark = save_mark;
+      md->mark = save_mark;   /* Always restore the mark setting */


-      /* A successful match means the assertion has failed. */
-       
-      if (rrc == MATCH_MATCH || rrc == MATCH_ACCEPT) RRETURN(MATCH_NOMATCH);
+      switch(rrc)
+        {
+        case MATCH_MATCH:            /* A successful match means */
+        case MATCH_ACCEPT:           /* the assertion has failed. */
+        RRETURN(MATCH_NOMATCH);
+        
+        case MATCH_NOMATCH:          /* Carry on with next branch */
+        break;  


-      /* See comment in the code for capturing groups above about handling
-      THEN. */
+        /* See comment in the code for capturing groups above about handling
+        THEN. */


-      if (rrc == MATCH_THEN)
-        {
+        case MATCH_THEN:
         next = ecode + GET(ecode,1);
         if (md->start_match_ptr < next &&
             (*ecode == OP_ALT || *next == OP_ALT))
+          {   
           rrc = MATCH_NOMATCH;
+          break;
+          }
+        /* Otherwise fall through. */  
+  
+        /* COMMIT, SKIP, PRUNE, and an uncaptured THEN cause the whole
+        assertion to fail to match, without considering any more alternatives.
+        Failing to match means the assertion is true. This is a consistent
+        approach, but does not always have the same effect as in Perl. */
+
+        case MATCH_COMMIT:
+        case MATCH_SKIP:
+        case MATCH_SKIP_ARG: 
+        case MATCH_PRUNE:
+        do ecode += GET(ecode,1); while (*ecode == OP_ALT);
+        goto NEG_ASSERT_TRUE;   /* Break out of alternation loop */
+      
+        /* Anything else is an error */
+         
+        default:
+        RRETURN(rrc); 
         }


-      /* No match on a branch means we must carry on and try the next branch. 
-      Anything else, in particular, SKIP, PRUNE, etc. causes a failure in the 
-      enclosing branch. This is a consistent approach, but does not always have 
-      the same effect as in Perl. */ 
-
-      if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+      /* Continue with next branch */
+       
       ecode += GET(ecode,1);
       }
     while (*ecode == OP_ALT);


     /* All branches in the assertion failed to match. */
-
+    
+    NEG_ASSERT_TRUE:
     if (condassert) RRETURN(MATCH_MATCH);  /* Condition assertion */
     ecode += 1 + LINK_SIZE;                /* Continue with current branch */
     continue;


Modified: code/trunk/testdata/testinput1
===================================================================
--- code/trunk/testdata/testinput1    2013-03-26 16:36:40 UTC (rev 1301)
+++ code/trunk/testdata/testinput1    2013-03-27 11:13:36 UTC (rev 1302)
@@ -5521,5 +5521,32 @@
     ac


 /--------/
+
+/(?(?!b(*THEN)a)bn|bnn)/
+   bnn 
+
+/(?!b(*SKIP)a)bn|bnn/
+    bnn


+/(?(?!b(*SKIP)a)bn|bnn)/
+   bnn 
+
+/(?!b(*PRUNE)a)bn|bnn/
+    bnn
+    
+/(?(?!b(*PRUNE)a)bn|bnn)/
+   bnn 
+   
+/(?!b(*COMMIT)a)bn|bnn/
+    bnn
+    
+/(?(?!b(*COMMIT)a)bn|bnn)/
+   bnn 
+
+/(?=b(*SKIP)a)bn|bnn/
+    bnn
+
+/(?=b(*THEN)a)bn|bnn/
+    bnn
+    
 /-- End of testinput1 --/


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2013-03-26 16:36:40 UTC (rev 1301)
+++ code/trunk/testdata/testinput2    2013-03-27 11:13:36 UTC (rev 1302)
@@ -3817,7 +3817,16 @@


 /^(A(*THEN)B|A(*THEN)D)/
     AD           
+    
+/(?!b(*THEN)a)bn|bnn/
+    bnn


+/(?(?=b(*SKIP)a)bn|bnn)/
+    bnn
+
+/(?=b(*THEN)a|)bn|bnn/
+    bnn
+
 /-------------------------/ 


/-- End of testinput2 --/

Modified: code/trunk/testdata/testoutput1
===================================================================
--- code/trunk/testdata/testoutput1    2013-03-26 16:36:40 UTC (rev 1301)
+++ code/trunk/testdata/testoutput1    2013-03-27 11:13:36 UTC (rev 1302)
@@ -9079,5 +9079,41 @@
 No match


 /--------/
+
+/(?(?!b(*THEN)a)bn|bnn)/
+   bnn 
+ 0: bn
+
+/(?!b(*SKIP)a)bn|bnn/
+    bnn
+ 0: bn


+/(?(?!b(*SKIP)a)bn|bnn)/
+   bnn 
+ 0: bn
+
+/(?!b(*PRUNE)a)bn|bnn/
+    bnn
+ 0: bn
+    
+/(?(?!b(*PRUNE)a)bn|bnn)/
+   bnn 
+ 0: bn
+   
+/(?!b(*COMMIT)a)bn|bnn/
+    bnn
+ 0: bn
+    
+/(?(?!b(*COMMIT)a)bn|bnn)/
+   bnn 
+ 0: bn
+
+/(?=b(*SKIP)a)bn|bnn/
+    bnn
+No match
+
+/(?=b(*THEN)a)bn|bnn/
+    bnn
+ 0: bnn
+    
 /-- End of testinput1 --/


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2013-03-26 16:36:40 UTC (rev 1301)
+++ code/trunk/testdata/testoutput2    2013-03-27 11:13:36 UTC (rev 1302)
@@ -12520,37 +12520,37 @@


  /^(?!a(*SKIP)b)/
      ac
-No match
+ 0: 


  /^(?!a(*SKIP)b)../
      acd
-No match
+ 0: ac


 /(?!a(*SKIP)b)../
      acd
- 0: cd
+ 0: ac


 /^(?(?!a(*SKIP)b))/
      ac
-No match
+ 0: 


 /^(?!a(*PRUNE)b)../
      acd
-No match
+ 0: ac


 /(?!a(*PRUNE)b)../
      acd
- 0: cd
+ 0: ac


  /(?!a(*COMMIT)b)ac|cd/
      ac
-No match
+ 0: ac


  /(?!a(*COMMIT)b)ac|ad/
      ac
-No match
+ 0: ac
      ad 
-No match
+ 0: ad


 /^(?!a(*THEN)b|ac)../
      ac
@@ -12596,7 +12596,19 @@
     AD           
  0: AD
  1: AD
+    
+/(?!b(*THEN)a)bn|bnn/
+    bnn
+ 0: bn


+/(?(?=b(*SKIP)a)bn|bnn)/
+    bnn
+No match
+
+/(?=b(*THEN)a|)bn|bnn/
+    bnn
+ 0: bn
+
 /-------------------------/ 


/-- End of testinput2 --/