Revision: 340
http://vcs.pcre.org/viewvc?view=rev&revision=340
Author: ph10
Date: 2008-04-18 21:00:21 +0100 (Fri, 18 Apr 2008)
Log Message:
-----------
Fix incorrect error for patterns like /(?2)[]a()b](abc)/
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/pcre_compile.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2008-04-13 15:14:34 UTC (rev 339)
+++ code/trunk/ChangeLog 2008-04-18 20:00:21 UTC (rev 340)
@@ -62,6 +62,15 @@
(a) A lone ] character is dis-allowed (Perl treats it as data).
(b) A back reference to an unmatched subpattern matches an empty string
(Perl fails the current match path).
+
+14. A pattern such as /(?2)[]a()b](abc)/ which had a forward reference to a
+ non-existent subpattern following a character class starting with ']' and
+ containing () gave an internal compiling error instead of "reference to
+ non-existent subpattern". Fortunately, when the pattern did exist, the
+ compiled code was correct. (When scanning forwards to check for the
+ existencd of the subpattern, it was treating the data ']' as terminating
+ the class, so got the count wrong. When actually compiling, the reference
+ was subsequently set up correctly.)
Version 7.6 28-Jan-08
Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c 2008-04-13 15:14:34 UTC (rev 339)
+++ code/trunk/pcre_compile.c 2008-04-18 20:00:21 UTC (rev 340)
@@ -1008,10 +1008,33 @@
continue;
}
- /* Skip over character classes */
+ /* Skip over character classes; this logic must be similar to the way they
+ are handled for real. If the first character is '^', skip it. Also, if the
+ first few characters (either before or after ^) are \Q\E or \E we skip them
+ too. This makes for compatibility with Perl. */
if (*ptr == '[')
{
+ BOOL negate_class = FALSE;
+ for (;;)
+ {
+ int c = *(++ptr);
+ if (c == '\\')
+ {
+ if (ptr[1] == 'E') ptr++;
+ else if (strncmp((const char *)ptr+1, "Q\\E", 3) == 0) ptr += 3;
+ else break;
+ }
+ else if (!negate_class && c == '^')
+ negate_class = TRUE;
+ else break;
+ }
+
+ /* If the next character is ']', it is a data character that must be
+ skipped. */
+
+ if (ptr[1] == ']') ptr++;
+
while (*(++ptr) != ']')
{
if (*ptr == 0) return -1;
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2008-04-13 15:14:34 UTC (rev 339)
+++ code/trunk/testdata/testinput2 2008-04-18 20:00:21 UTC (rev 340)
@@ -2667,4 +2667,29 @@
/TA]/<JS>
The ACTA] comes
+/(?2)[]a()b](abc)/
+ abcbabc
+
+/(?2)[^]a()b](abc)/
+ abcbabc
+
+/(?1)[]a()b](abc)/
+ abcbabc
+ ** Failers
+ abcXabc
+
+/(?1)[^]a()b](abc)/
+ abcXabc
+ ** Failers
+ abcbabc
+
+/(?2)[]a()b](abc)(xyz)/
+ xyzbabcxyz
+
+/(?&N)[]a(?<N>)](?<M>abc)/
+ abc<abc
+
+/(?&N)[]a(?<N>)](abc)/
+ abc<abc
+
/ End of testinput2 /
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2008-04-13 15:14:34 UTC (rev 339)
+++ code/trunk/testdata/testoutput2 2008-04-18 20:00:21 UTC (rev 340)
@@ -9545,4 +9545,40 @@
/TA]/<JS>
Failed: ] is an invalid data character in JavaScript compatibility mode at offset 2
+/(?2)[]a()b](abc)/
+Failed: reference to non-existent subpattern at offset 3
+
+/(?2)[^]a()b](abc)/
+Failed: reference to non-existent subpattern at offset 3
+
+/(?1)[]a()b](abc)/
+ abcbabc
+ 0: abcbabc
+ 1: abc
+ ** Failers
+No match
+ abcXabc
+No match
+
+/(?1)[^]a()b](abc)/
+ abcXabc
+ 0: abcXabc
+ 1: abc
+ ** Failers
+No match
+ abcbabc
+No match
+
+/(?2)[]a()b](abc)(xyz)/
+ xyzbabcxyz
+ 0: xyzbabcxyz
+ 1: abc
+ 2: xyz
+
+/(?&N)[]a(?<N>)](?<M>abc)/
+Failed: reference to non-existent subpattern at offset 4
+
+/(?&N)[]a(?<N>)](abc)/
+Failed: reference to non-existent subpattern at offset 4
+
/ End of testinput2 /