Revision: 888
http://www.exim.org/viewvc/pcre2?view=rev&revision=888
Author: ph10
Date: 2017-12-12 15:01:51 +0000 (Tue, 12 Dec 2017)
Log Message:
-----------
Fix incorrect first matching character when a backreference with zero minimum
repeat starts a pattern (possibly after assertions).
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/src/pcre2_compile.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2017-12-08 10:25:49 UTC (rev 887)
+++ code/trunk/ChangeLog 2017-12-12 15:01:51 UTC (rev 888)
@@ -65,7 +65,12 @@
lines were processed (assuming 32-bit ints). They have all been changed to
unsigned long ints.
+17. If a backreference with a minimum repeat count of zero was first in a
+pattern, apart from assertions, an incorrect first matching character could be
+recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
+as the first character of a match.
+
Version 10.30 14-August-2017
----------------------------
Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c 2017-12-08 10:25:49 UTC (rev 887)
+++ code/trunk/src/pcre2_compile.c 2017-12-12 15:01:51 UTC (rev 888)
@@ -7135,7 +7135,7 @@
later. */
HANDLE_SINGLE_REFERENCE:
- if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE;
+ if (firstcuflags == REQ_UNSET) zerofirstcuflags = firstcuflags = REQ_NONE;
*code++ = ((options & PCRE2_CASELESS) != 0)? OP_REFI : OP_REF;
PUT2INC(code, 0, meta_arg);
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2017-12-08 10:25:49 UTC (rev 887)
+++ code/trunk/testdata/testinput2 2017-12-12 15:01:51 UTC (rev 888)
@@ -5375,4 +5375,14 @@
/[\d-[:print:]]/
+# Perl gets the second of these wrong, giving no match.
+
+"(?<=(a))\1?b"I
+ ab
+ aaab
+
+"(?=(a))\1?b"I
+ ab
+ aaab
+
# End of testinput2
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2017-12-08 10:25:49 UTC (rev 887)
+++ code/trunk/testdata/testoutput2 2017-12-12 15:01:51 UTC (rev 888)
@@ -16340,6 +16340,34 @@
/[\d-[:print:]]/
Failed: error 150 at offset 3: invalid range in character class
+# Perl gets the second of these wrong, giving no match.
+
+"(?<=(a))\1?b"I
+Capturing subpattern count = 1
+Max back reference = 1
+Max lookbehind = 1
+Last code unit = 'b'
+Subject length lower bound = 1
+ ab
+ 0: b
+ 1: a
+ aaab
+ 0: ab
+ 1: a
+
+"(?=(a))\1?b"I
+Capturing subpattern count = 1
+Max back reference = 1
+Starting code units: a
+Last code unit = 'b'
+Subject length lower bound = 1
+ ab
+ 0: ab
+ 1: a
+ aaab
+ 0: ab
+ 1: a
+
# End of testinput2
Error -65: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data