[Pcre-svn] [918] code/trunk: Fix \C bug with repeated charac…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [918] code/trunk: Fix \C bug with repeated character classes in UTF-8 mode.
Revision: 918
          http://www.exim.org/viewvc/pcre2?view=rev&revision=918
Author:   ph10
Date:     2018-02-19 17:26:33 +0000 (Mon, 19 Feb 2018)
Log Message:
-----------
Fix \C bug with repeated character classes in UTF-8 mode.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/src/pcre2_match.c
    code/trunk/testdata/testinput22
    code/trunk/testdata/testoutput22-16
    code/trunk/testdata/testoutput22-32
    code/trunk/testdata/testoutput22-8


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/ChangeLog    2018-02-19 17:26:33 UTC (rev 918)
@@ -20,7 +20,12 @@
 specified. Similarly, running "pcfre2test -C bsr" never produced the result 
 ANY.


+4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
+multi-code-unit characters caused bad behaviour and possibly a crash. This
+issue was fixed for other kinds of repeat in release 10.20 by change 19, but
+repeating character classes were overlooked.

+
Version 10.31 12-February-2018
------------------------------


Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/src/pcre2_match.c    2018-02-19 17:26:33 UTC (rev 918)
@@ -1962,11 +1962,15 @@


           if (reptype == REPTYPE_POS) continue;    /* No backtracking */


+          /* After \C in UTF mode, Lstart_eptr might be in the middle of a
+          Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't
+          go too far. */
+
           for (;;)
             {
             RMATCH(Fecode, RM201);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (Feptr-- == Lstart_eptr) break;  /* Tried at original position */
+            if (Feptr-- <= Lstart_eptr) break;  /* Tried at original position */
             BACKCHAR(Feptr);
             }
           }
@@ -2126,11 +2130,15 @@


         if (reptype == REPTYPE_POS) continue;    /* No backtracking */


+        /* After \C in UTF mode, Lstart_eptr might be in the middle of a
+        Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't
+        go too far. */
+
         for(;;)
           {
           RMATCH(Fecode, RM101);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-          if (Feptr-- == Lstart_eptr) break;  /* Tried at original position */
+          if (Feptr-- <= Lstart_eptr) break;  /* Tried at original position */
 #ifdef SUPPORT_UNICODE
           if (utf) BACKCHAR(Feptr);
 #endif
@@ -4002,8 +4010,8 @@
         if (reptype == REPTYPE_POS) continue;    /* No backtracking */


         /* After \C in UTF mode, Lstart_eptr might be in the middle of a
-        Unicode character. Use <= pp to ensure backtracking doesn't go too far.
-        */
+        Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't
+        go too far. */


         for(;;)
           {


Modified: code/trunk/testdata/testinput22
===================================================================
--- code/trunk/testdata/testinput22    2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/testdata/testinput22    2018-02-19 17:26:33 UTC (rev 918)
@@ -99,4 +99,7 @@
 \= Expect no match - tests \C at end of subject
     ab


+/\C[^\v]+\x80/utf
+    [AΏBŀC]
+
 # End of testinput22


Modified: code/trunk/testdata/testoutput22-16
===================================================================
--- code/trunk/testdata/testoutput22-16    2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/testdata/testoutput22-16    2018-02-19 17:26:33 UTC (rev 918)
@@ -172,4 +172,8 @@
     ab
 No match


+/\C[^\v]+\x80/utf
+    [AΏBŀC]
+No match
+
 # End of testinput22


Modified: code/trunk/testdata/testoutput22-32
===================================================================
--- code/trunk/testdata/testoutput22-32    2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/testdata/testoutput22-32    2018-02-19 17:26:33 UTC (rev 918)
@@ -170,4 +170,8 @@
     ab
 No match


+/\C[^\v]+\x80/utf
+    [AΏBŀC]
+No match
+
 # End of testinput22


Modified: code/trunk/testdata/testoutput22-8
===================================================================
--- code/trunk/testdata/testoutput22-8    2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/testdata/testoutput22-8    2018-02-19 17:26:33 UTC (rev 918)
@@ -174,4 +174,8 @@
     ab
 No match


+/\C[^\v]+\x80/utf
+    [AΏBŀC]
+No match
+
 # End of testinput22