Revision: 578
http://vcs.pcre.org/viewvc?view=rev&revision=578
Author: ph10
Date: 2010-11-23 15:34:55 +0000 (Tue, 23 Nov 2010)
Log Message:
-----------
Fix internal error for recursive named back references.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/pcre_compile.c
code/trunk/testdata/testinput11
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput11
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/ChangeLog 2010-11-23 15:34:55 UTC (rev 578)
@@ -120,6 +120,12 @@
to pcregrep and other applications that have no direct access to PCRE
options. The new /Y option in pcretest sets this option when calling
pcre_compile().
+
+21. Change 18 of release 8.01 broke the use of named subpatterns for recursive
+ back references. Groups containing recursive back references were forced to
+ be atomic by that change, but in the case of named groups, the amount of
+ memory required was incorrectly computed, leading to "Failed: internal
+ error: code overflow". This has been fixed.
Version 8.10 25-Jun-2010
Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c 2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/pcre_compile.c 2010-11-23 15:34:55 UTC (rev 578)
@@ -1105,11 +1105,22 @@
start at a parenthesis. It scans along a pattern's text looking for capturing
subpatterns, and counting them. If it finds a named pattern that matches the
name it is given, it returns its number. Alternatively, if the name is NULL, it
-returns when it reaches a given numbered subpattern. We know that if (?P< is
-encountered, the name will be terminated by '>' because that is checked in the
-first pass. Recursion is used to keep track of subpatterns that reset the
-capturing group numbers - the (?| feature.
+returns when it reaches a given numbered subpattern. Recursion is used to keep
+track of subpatterns that reset the capturing group numbers - the (?| feature.
+This function was originally called only from the second pass, in which we know
+that if (?< or (?' or (?P< is encountered, the name will be correctly
+terminated because that is checked in the first pass. There is now one call to
+this function in the first pass, to check for a recursive back reference by
+name (so that we can make the whole group atomic). In this case, we need check
+only up to the current position in the pattern, and that is still OK because
+and previous occurrences will have been checked. To make this work, the test
+for "end of pattern" is a check against cd->end_pattern in the main loop,
+instead of looking for a binary zero. This means that the special first-pass
+call can adjust cd->end_pattern temporarily. (Checks for binary zero while
+processing items within the loop are OK, because afterwards the main loop will
+terminate.)
+
Arguments:
ptrptr address of the current character pointer (updated)
cd compile background data
@@ -1209,9 +1220,11 @@
}
/* Past any initial parenthesis handling, scan for parentheses or vertical
-bars. */
+bars. Stop if we get to cd->end_pattern. Note that this is important for the
+first-pass call when this value is temporarily adjusted to stop at the current
+position. So DO NOT change this to a test for binary zero. */
-for (; *ptr != 0; ptr++)
+for (; ptr < cd->end_pattern; ptr++)
{
/* Skip over backslashed characters and also entire \Q...\E */
@@ -5373,11 +5386,17 @@
while ((cd->ctypes[*ptr] & ctype_word) != 0) ptr++;
namelen = (int)(ptr - name);
- /* In the pre-compile phase, do a syntax check and set a dummy
- reference number. */
+ /* In the pre-compile phase, do a syntax check. We used to just set
+ a dummy reference number, because it was not used in the first pass.
+ However, with the change of recursive back references to be atomic,
+ we have to look for the number so that this state can be identified, as
+ otherwise the incorrect length is computed. If it's not a backwards
+ reference, the dummy number will do. */
if (lengthptr != NULL)
{
+ const uschar *temp;
+
if (namelen == 0)
{
*errorcodeptr = ERR62;
@@ -5393,7 +5412,22 @@
*errorcodeptr = ERR48;
goto FAILED;
}
- recno = 0;
+
+ /* The name table does not exist in the first pass, so we cannot
+ do a simple search as in the code below. Instead, we have to scan the
+ pattern to find the number. It is important that we scan it only as
+ far as we have got because the syntax of named subpatterns has not
+ been checked for the rest of the pattern, and find_parens() assumes
+ correct syntax. In any case, it's a waste of resources to scan
+ further. We stop the scan at the current point by temporarily
+ adjusting the value of cd->endpattern. */
+
+ temp = cd->end_pattern;
+ cd->end_pattern = ptr;
+ recno = find_parens(cd, name, namelen,
+ (options & PCRE_EXTENDED) != 0, utf8);
+ cd->end_pattern = temp;
+ if (recno < 0) recno = 0; /* Forward ref; set dummy number */
}
/* In the real compile, seek the name in the table. We check the name
Modified: code/trunk/testdata/testinput11
===================================================================
--- code/trunk/testdata/testinput11 2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/testdata/testinput11 2010-11-23 15:34:55 UTC (rev 578)
@@ -504,4 +504,7 @@
/(*SKIP)b/
a
+/(?P<abn>(?P=abn)xxx|)+/
+ xxx
+
/-- End of testinput11 --/
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/testdata/testinput2 2010-11-23 15:34:55 UTC (rev 578)
@@ -3560,4 +3560,14 @@
/^\cģ/
+/(?P<abn>(?P=abn)xxx)/BZ
+
+/(a\1z)/BZ
+
+/(?P<abn>(?P=abn)(?<badstufxxx)/BZ
+
+/(?P<abn>(?P=axn)xxx)/BZ
+
+/(?P<abn>(?P=axn)xxx)(?<axn>yy)/BZ
+
/-- End of testinput2 --/
Modified: code/trunk/testdata/testoutput11
===================================================================
--- code/trunk/testdata/testoutput11 2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/testdata/testoutput11 2010-11-23 15:34:55 UTC (rev 578)
@@ -970,4 +970,9 @@
a
No match
+/(?P<abn>(?P=abn)xxx|)+/
+ xxx
+ 0:
+ 1:
+
/-- End of testinput11 --/
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/testdata/testoutput2 2010-11-23 15:34:55 UTC (rev 578)
@@ -11258,4 +11258,51 @@
/^\cģ/
Failed: \c must be followed by an ASCII character at offset 3
+/(?P<abn>(?P=abn)xxx)/BZ
+------------------------------------------------------------------
+ Bra
+ Once
+ CBra 1
+ \1
+ xxx
+ Ket
+ Ket
+ Ket
+ End
+------------------------------------------------------------------
+
+/(a\1z)/BZ
+------------------------------------------------------------------
+ Bra
+ Once
+ CBra 1
+ a
+ \1
+ z
+ Ket
+ Ket
+ Ket
+ End
+------------------------------------------------------------------
+
+/(?P<abn>(?P=abn)(?<badstufxxx)/BZ
+Failed: syntax error in subpattern name (missing terminator) at offset 29
+
+/(?P<abn>(?P=axn)xxx)/BZ
+Failed: reference to non-existent subpattern at offset 15
+
+/(?P<abn>(?P=axn)xxx)(?<axn>yy)/BZ
+------------------------------------------------------------------
+ Bra
+ CBra 1
+ \2
+ xxx
+ Ket
+ CBra 2
+ yy
+ Ket
+ Ket
+ End
+------------------------------------------------------------------
+
/-- End of testinput2 --/