[Pcre-svn] [1418] code/trunk: Fix pcretest' s handling of pa…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1418] code/trunk: Fix pcretest' s handling of patterns when \K in an assertion sets the start of a
Revision: 1418
          http://vcs.pcre.org/viewvc?view=rev&revision=1418
Author:   ph10
Date:     2013-12-27 12:23:25 +0000 (Fri, 27 Dec 2013)


Log Message:
-----------
Fix pcretest's handling of patterns when \K in an assertion sets the start of a
match past the end of the match.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcresyntax.3
    code/trunk/pcretest.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/ChangeLog    2013-12-27 12:23:25 UTC (rev 1418)
@@ -14,6 +14,12 @@


 3.  Got rid of some compiler warnings for potentially uninitialized variables 
     that show up only when compiled with -O2. 
+    
+4.  A pattern such as (?=ab\K) that uses \K in an assertion can set the start
+    of a match later then the end of the match. The pcretest program was not 
+    handling the case sensibly - it was outputting from the start to the next 
+    binary zero. It now reports this situation in a message, and outputs the 
+    text from the end to the start.



Version 8.34 15-December-2013

Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/doc/pcrepattern.3    2013-12-27 12:23:25 UTC (rev 1418)
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "03 December 2013" "PCRE 8.34"
+.TH PCREPATTERN 3 "27 December 2013" "PCRE 8.35"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -1004,7 +1004,9 @@
 .P
 Perl documents that the use of \eK within assertions is "not well defined". In
 PCRE, \eK is acted upon when it occurs inside positive assertions, but is
-ignored in negative assertions.
+ignored in negative assertions. Note that when a pattern such as (?=ab\eK)
+matches, the reported start of the match can be greater than the end of the 
+match.
 .
 .
 .\" HTML <a name="smallassertions"></a>
@@ -3255,6 +3257,6 @@
 .rs
 .sp
 .nf
-Last updated: 03 December 2013
+Last updated: 27 December 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcresyntax.3
===================================================================
--- code/trunk/doc/pcresyntax.3    2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/doc/pcresyntax.3    2013-12-27 12:23:25 UTC (rev 1418)
@@ -1,4 +1,4 @@
-.TH PCRESYNTAX 3 "12 November 2013" "PCRE 8.34"
+.TH PCRESYNTAX 3 "27 December 2013" "PCRE 8.35"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -309,6 +309,8 @@
 .rs
 .sp
   \eK          reset start of match
+.sp
+\eK is honoured in positive assertions, but ignored in negative ones.   
 .
 .
 .SH "ALTERNATION"
@@ -508,6 +510,6 @@
 .rs
 .sp
 .nf
-Last updated: 12 November 2013
+Last updated: 27 December 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi


Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/pcretest.c    2013-12-27 12:23:25 UTC (rev 1418)
@@ -5192,7 +5192,8 @@
           if (count * 2 > use_size_offsets) count = use_size_offsets/2;
           }


-        /* Output the captured substrings */
+        /* Output the captured substrings. Note that, for the matched string, 
+        the use of \K in an assertion can make the start later than the end. */


         for (i = 0; i < count * 2; i += 2)
           {
@@ -5208,11 +5209,25 @@
             }
           else
             {
+            int start = use_offsets[i];
+            int end = use_offsets[i+1];
+               
+            if (start > end)
+              {
+              start = use_offsets[i+1];
+              end = use_offsets[i];
+              fprintf(outfile, "Start of matched string is beyond its end - " 
+                "displaying from end to start.\n"); 
+              }  
+ 
             fprintf(outfile, "%2d: ", i/2);
-            PCHARSV(bptr, use_offsets[i],
-              use_offsets[i+1] - use_offsets[i], outfile);
+            PCHARSV(bptr, start, end - start, outfile);
             if (verify_jit && jit_was_used) fprintf(outfile, " (JIT)");
             fprintf(outfile, "\n");
+            
+            /* Note: don't use the start/end variables here because we want to
+            show the text from what is reported as the end. */
+             
             if (do_showcaprest || (i == 0 && do_showrest))
               {
               fprintf(outfile, "%2d+ ", i/2);


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/testdata/testinput2    2013-12-27 12:23:25 UTC (rev 1418)
@@ -4045,4 +4045,7 @@


/[a[:<:]] should give error/

+/(?=ab\K)/+
+    abcd
+
 /-- End of testinput2 --/


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/testdata/testoutput2    2013-12-27 12:23:25 UTC (rev 1418)
@@ -14125,4 +14125,10 @@
 /[a[:<:]] should give error/ 
 Failed: unknown POSIX class name at offset 4


+/(?=ab\K)/+
+    abcd
+Start of matched string is beyond its end - displaying from end to start.
+ 0: ab
+ 0+ abcd
+
 /-- End of testinput2 --/