Revision: 1418
http://vcs.pcre.org/viewvc?view=rev&revision=1418
Author: ph10
Date: 2013-12-27 12:23:25 +0000 (Fri, 27 Dec 2013)
Log Message:
-----------
Fix pcretest's handling of patterns when \K in an assertion sets the start of a
match past the end of the match.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcrepattern.3
code/trunk/doc/pcresyntax.3
code/trunk/pcretest.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/ChangeLog 2013-12-27 12:23:25 UTC (rev 1418)
@@ -14,6 +14,12 @@
3. Got rid of some compiler warnings for potentially uninitialized variables
that show up only when compiled with -O2.
+
+4. A pattern such as (?=ab\K) that uses \K in an assertion can set the start
+ of a match later then the end of the match. The pcretest program was not
+ handling the case sensibly - it was outputting from the start to the next
+ binary zero. It now reports this situation in a message, and outputs the
+ text from the end to the start.
Version 8.34 15-December-2013
Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3 2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/doc/pcrepattern.3 2013-12-27 12:23:25 UTC (rev 1418)
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "03 December 2013" "PCRE 8.34"
+.TH PCREPATTERN 3 "27 December 2013" "PCRE 8.35"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -1004,7 +1004,9 @@
.P
Perl documents that the use of \eK within assertions is "not well defined". In
PCRE, \eK is acted upon when it occurs inside positive assertions, but is
-ignored in negative assertions.
+ignored in negative assertions. Note that when a pattern such as (?=ab\eK)
+matches, the reported start of the match can be greater than the end of the
+match.
.
.
.\" HTML <a name="smallassertions"></a>
@@ -3255,6 +3257,6 @@
.rs
.sp
.nf
-Last updated: 03 December 2013
+Last updated: 27 December 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi
Modified: code/trunk/doc/pcresyntax.3
===================================================================
--- code/trunk/doc/pcresyntax.3 2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/doc/pcresyntax.3 2013-12-27 12:23:25 UTC (rev 1418)
@@ -1,4 +1,4 @@
-.TH PCRESYNTAX 3 "12 November 2013" "PCRE 8.34"
+.TH PCRESYNTAX 3 "27 December 2013" "PCRE 8.35"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -309,6 +309,8 @@
.rs
.sp
\eK reset start of match
+.sp
+\eK is honoured in positive assertions, but ignored in negative ones.
.
.
.SH "ALTERNATION"
@@ -508,6 +510,6 @@
.rs
.sp
.nf
-Last updated: 12 November 2013
+Last updated: 27 December 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi
Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c 2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/pcretest.c 2013-12-27 12:23:25 UTC (rev 1418)
@@ -5192,7 +5192,8 @@
if (count * 2 > use_size_offsets) count = use_size_offsets/2;
}
- /* Output the captured substrings */
+ /* Output the captured substrings. Note that, for the matched string,
+ the use of \K in an assertion can make the start later than the end. */
for (i = 0; i < count * 2; i += 2)
{
@@ -5208,11 +5209,25 @@
}
else
{
+ int start = use_offsets[i];
+ int end = use_offsets[i+1];
+
+ if (start > end)
+ {
+ start = use_offsets[i+1];
+ end = use_offsets[i];
+ fprintf(outfile, "Start of matched string is beyond its end - "
+ "displaying from end to start.\n");
+ }
+
fprintf(outfile, "%2d: ", i/2);
- PCHARSV(bptr, use_offsets[i],
- use_offsets[i+1] - use_offsets[i], outfile);
+ PCHARSV(bptr, start, end - start, outfile);
if (verify_jit && jit_was_used) fprintf(outfile, " (JIT)");
fprintf(outfile, "\n");
+
+ /* Note: don't use the start/end variables here because we want to
+ show the text from what is reported as the end. */
+
if (do_showcaprest || (i == 0 && do_showrest))
{
fprintf(outfile, "%2d+ ", i/2);
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/testdata/testinput2 2013-12-27 12:23:25 UTC (rev 1418)
@@ -4045,4 +4045,7 @@
/[a[:<:]] should give error/
+/(?=ab\K)/+
+ abcd
+
/-- End of testinput2 --/
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2013-12-24 18:03:06 UTC (rev 1417)
+++ code/trunk/testdata/testoutput2 2013-12-27 12:23:25 UTC (rev 1418)
@@ -14125,4 +14125,10 @@
/[a[:<:]] should give error/
Failed: unknown POSIX class name at offset 4
+/(?=ab\K)/+
+ abcd
+Start of matched string is beyond its end - displaying from end to start.
+ 0: ab
+ 0+ abcd
+
/-- End of testinput2 --/