[Pcre-svn] [245] code/trunk: Fix other cases where backtrack…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [245] code/trunk: Fix other cases where backtracking after \C could cause a crash.
Revision: 245
          http://www.exim.org/viewvc/pcre2?view=rev&revision=245
Author:   ph10
Date:     2015-04-08 17:53:22 +0100 (Wed, 08 Apr 2015)


Log Message:
-----------
Fix other cases where backtracking after \C could cause a crash.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/src/pcre2_match.c
    code/trunk/testdata/testinput4
    code/trunk/testdata/testoutput4


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2015-04-08 16:33:58 UTC (rev 244)
+++ code/trunk/ChangeLog    2015-04-08 16:53:22 UTC (rev 245)
@@ -74,9 +74,10 @@
 the code there did catch the loop.


19. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
-and a subsequent item in the pattern caused a non-match, backtracking over the
-repeated \X did not stop, but carried on past the start of the subject, causing
-reference to random memory and/or a segfault. This bug was discovered by the
+and a subsequent item in the pattern caused a non-match, backtracking over the
+repeated \X did not stop, but carried on past the start of the subject, causing
+reference to random memory and/or a segfault. There were also some other cases
+where backtracking after \C could crash. This set of bugs was discovered by the
LLVM fuzzer.



Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2015-04-08 16:33:58 UTC (rev 244)
+++ code/trunk/src/pcre2_match.c    2015-04-08 16:53:22 UTC (rev 245)
@@ -3576,9 +3576,13 @@
             }


           if (possessive) continue;    /* No backtracking */
+
+          /* After \C in UTF mode, pp might be in the middle of a Unicode
+          character. Use <= pp to ensure backtracking doesn't go too far. */
+
           for(;;)
             {
-            if (eptr == pp) goto TAIL_RECURSE;
+            if (eptr <= pp) goto TAIL_RECURSE;
             RMATCH(eptr, ecode, offset_top, mb, eptrb, RM23);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
             eptr--;
@@ -3973,9 +3977,13 @@
             eptr += len;
             }
           if (possessive) continue;    /* No backtracking */
+
+          /* After \C in UTF mode, pp might be in the middle of a Unicode
+          character. Use <= pp to ensure backtracking doesn't go too far. */
+
           for(;;)
             {
-            if (eptr == pp) goto TAIL_RECURSE;
+            if (eptr <= pp) goto TAIL_RECURSE;
             RMATCH(eptr, ecode, offset_top, mb, eptrb, RM30);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
             eptr--;
@@ -4108,9 +4116,13 @@
             eptr += len;
             }
           if (possessive) continue;    /* No backtracking */
+
+          /* After \C in UTF mode, pp might be in the middle of a Unicode
+          character. Use <= pp to ensure backtracking doesn't go too far. */
+
           for(;;)
             {
-            if (eptr == pp) goto TAIL_RECURSE;
+            if (eptr <= pp) goto TAIL_RECURSE;
             RMATCH(eptr, ecode, offset_top, mb, eptrb, RM34);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
             eptr--;
@@ -5679,9 +5691,13 @@
         /* eptr is now past the end of the maximum run */


         if (possessive) continue;    /* No backtracking */
+
+        /* After \C in UTF mode, pp might be in the middle of a Unicode
+        character. Use <= pp to ensure backtracking doesn't go too far. */
+
         for(;;)
           {
-          if (eptr == pp) goto TAIL_RECURSE;
+          if (eptr <= pp) goto TAIL_RECURSE;
           RMATCH(eptr, ecode, offset_top, mb, eptrb, RM44);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
           eptr--;
@@ -5999,9 +6015,13 @@
           }


         if (possessive) continue;    /* No backtracking */
+
+        /* After \C in UTF mode, pp might be in the middle of a Unicode
+        character. Use <= pp to ensure backtracking doesn't go too far. */
+
         for(;;)
           {
-          if (eptr == pp) goto TAIL_RECURSE;
+          if (eptr <= pp) goto TAIL_RECURSE;
           RMATCH(eptr, ecode, offset_top, mb, eptrb, RM46);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
           eptr--;


Modified: code/trunk/testdata/testinput4
===================================================================
--- code/trunk/testdata/testinput4    2015-04-08 16:33:58 UTC (rev 244)
+++ code/trunk/testdata/testinput4    2015-04-08 16:53:22 UTC (rev 245)
@@ -2227,4 +2227,7 @@
 /utf
     Ӆ\x0a


+/\C(\W?ſ)'?{{/utf
+    \\C(\\W?ſ)'?{{
+
 # End of testinput4


Modified: code/trunk/testdata/testoutput4
===================================================================
--- code/trunk/testdata/testoutput4    2015-04-08 16:33:58 UTC (rev 244)
+++ code/trunk/testdata/testoutput4    2015-04-08 16:53:22 UTC (rev 245)
@@ -3748,4 +3748,8 @@
     Ӆ\x0a
 No match


+/\C(\W?ſ)'?{{/utf
+    \\C(\\W?ſ)'?{{
+No match
+
 # End of testinput4