[Pcre-svn] [578] code/trunk: Fix internal error for recursiv…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [578] code/trunk: Fix internal error for recursive named back references.
Revision: 578
          http://vcs.pcre.org/viewvc?view=rev&revision=578
Author:   ph10
Date:     2010-11-23 15:34:55 +0000 (Tue, 23 Nov 2010)


Log Message:
-----------
Fix internal error for recursive named back references.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/pcre_compile.c
    code/trunk/testdata/testinput11
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput11
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/ChangeLog    2010-11-23 15:34:55 UTC (rev 578)
@@ -120,6 +120,12 @@
     to pcregrep and other applications that have no direct access to PCRE 
     options. The new /Y option in pcretest sets this option when calling 
     pcre_compile().  
+    
+21. Change 18 of release 8.01 broke the use of named subpatterns for recursive
+    back references. Groups containing recursive back references were forced to 
+    be atomic by that change, but in the case of named groups, the amount of 
+    memory required was incorrectly computed, leading to "Failed: internal 
+    error: code overflow". This has been fixed.  



Version 8.10 25-Jun-2010

Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/pcre_compile.c    2010-11-23 15:34:55 UTC (rev 578)
@@ -1105,11 +1105,22 @@
 start at a parenthesis. It scans along a pattern's text looking for capturing
 subpatterns, and counting them. If it finds a named pattern that matches the
 name it is given, it returns its number. Alternatively, if the name is NULL, it
-returns when it reaches a given numbered subpattern. We know that if (?P< is
-encountered, the name will be terminated by '>' because that is checked in the
-first pass. Recursion is used to keep track of subpatterns that reset the
-capturing group numbers - the (?| feature.
+returns when it reaches a given numbered subpattern. Recursion is used to keep
+track of subpatterns that reset the capturing group numbers - the (?| feature.


+This function was originally called only from the second pass, in which we know
+that if (?< or (?' or (?P< is encountered, the name will be correctly
+terminated because that is checked in the first pass. There is now one call to
+this function in the first pass, to check for a recursive back reference by
+name (so that we can make the whole group atomic). In this case, we need check
+only up to the current position in the pattern, and that is still OK because 
+and previous occurrences will have been checked. To make this work, the test 
+for "end of pattern" is a check against cd->end_pattern in the main loop, 
+instead of looking for a binary zero. This means that the special first-pass
+call can adjust cd->end_pattern temporarily. (Checks for binary zero while 
+processing items within the loop are OK, because afterwards the main loop will 
+terminate.)
+
 Arguments:
   ptrptr       address of the current character pointer (updated)
   cd           compile background data
@@ -1209,9 +1220,11 @@
   }


/* Past any initial parenthesis handling, scan for parentheses or vertical
-bars. */
+bars. Stop if we get to cd->end_pattern. Note that this is important for the
+first-pass call when this value is temporarily adjusted to stop at the current
+position. So DO NOT change this to a test for binary zero. */

-for (; *ptr != 0; ptr++)
+for (; ptr < cd->end_pattern; ptr++)
{
/* Skip over backslashed characters and also entire \Q...\E */

@@ -5373,11 +5386,17 @@
         while ((cd->ctypes[*ptr] & ctype_word) != 0) ptr++;
         namelen = (int)(ptr - name);


-        /* In the pre-compile phase, do a syntax check and set a dummy
-        reference number. */
+        /* In the pre-compile phase, do a syntax check. We used to just set
+        a dummy reference number, because it was not used in the first pass.
+        However, with the change of recursive back references to be atomic,
+        we have to look for the number so that this state can be identified, as
+        otherwise the incorrect length is computed. If it's not a backwards
+        reference, the dummy number will do. */


         if (lengthptr != NULL)
           {
+          const uschar *temp; 
+           
           if (namelen == 0)
             {
             *errorcodeptr = ERR62;
@@ -5393,7 +5412,22 @@
             *errorcodeptr = ERR48;
             goto FAILED;
             }
-          recno = 0;
+            
+          /* The name table does not exist in the first pass, so we cannot
+          do a simple search as in the code below. Instead, we have to scan the 
+          pattern to find the number. It is important that we scan it only as
+          far as we have got because the syntax of named subpatterns has not 
+          been checked for the rest of the pattern, and find_parens() assumes 
+          correct syntax. In any case, it's a waste of resources to scan 
+          further. We stop the scan at the current point by temporarily 
+          adjusting the value of cd->endpattern. */
+          
+          temp = cd->end_pattern;
+          cd->end_pattern = ptr;
+          recno = find_parens(cd, name, namelen, 
+            (options & PCRE_EXTENDED) != 0, utf8);
+          cd->end_pattern = temp;   
+          if (recno < 0) recno = 0;    /* Forward ref; set dummy number */
           }


         /* In the real compile, seek the name in the table. We check the name


Modified: code/trunk/testdata/testinput11
===================================================================
--- code/trunk/testdata/testinput11    2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/testdata/testinput11    2010-11-23 15:34:55 UTC (rev 578)
@@ -504,4 +504,7 @@
 /(*SKIP)b/
     a 


+/(?P<abn>(?P=abn)xxx|)+/
+    xxx
+
 /-- End of testinput11 --/


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/testdata/testinput2    2010-11-23 15:34:55 UTC (rev 578)
@@ -3560,4 +3560,14 @@


/^\cģ/

+/(?P<abn>(?P=abn)xxx)/BZ
+
+/(a\1z)/BZ
+
+/(?P<abn>(?P=abn)(?<badstufxxx)/BZ
+
+/(?P<abn>(?P=axn)xxx)/BZ
+
+/(?P<abn>(?P=axn)xxx)(?<axn>yy)/BZ
+
/-- End of testinput2 --/

Modified: code/trunk/testdata/testoutput11
===================================================================
--- code/trunk/testdata/testoutput11    2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/testdata/testoutput11    2010-11-23 15:34:55 UTC (rev 578)
@@ -970,4 +970,9 @@
     a 
 No match


+/(?P<abn>(?P=abn)xxx|)+/
+    xxx
+ 0: 
+ 1: 
+
 /-- End of testinput11 --/


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2010-11-21 18:51:22 UTC (rev 577)
+++ code/trunk/testdata/testoutput2    2010-11-23 15:34:55 UTC (rev 578)
@@ -11258,4 +11258,51 @@
 /^\cģ/
 Failed: \c must be followed by an ASCII character at offset 3


+/(?P<abn>(?P=abn)xxx)/BZ
+------------------------------------------------------------------
+        Bra
+        Once
+        CBra 1
+        \1
+        xxx
+        Ket
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+
+/(a\1z)/BZ
+------------------------------------------------------------------
+        Bra
+        Once
+        CBra 1
+        a
+        \1
+        z
+        Ket
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+
+/(?P<abn>(?P=abn)(?<badstufxxx)/BZ
+Failed: syntax error in subpattern name (missing terminator) at offset 29
+
+/(?P<abn>(?P=axn)xxx)/BZ
+Failed: reference to non-existent subpattern at offset 15
+
+/(?P<abn>(?P=axn)xxx)(?<axn>yy)/BZ
+------------------------------------------------------------------
+        Bra
+        CBra 1
+        \2
+        xxx
+        Ket
+        CBra 2
+        yy
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+
 /-- End of testinput2 --/