Revision: 918
http://www.exim.org/viewvc/pcre2?view=rev&revision=918
Author: ph10
Date: 2018-02-19 17:26:33 +0000 (Mon, 19 Feb 2018)
Log Message:
-----------
Fix \C bug with repeated character classes in UTF-8 mode.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/src/pcre2_match.c
code/trunk/testdata/testinput22
code/trunk/testdata/testoutput22-16
code/trunk/testdata/testoutput22-32
code/trunk/testdata/testoutput22-8
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/ChangeLog 2018-02-19 17:26:33 UTC (rev 918)
@@ -20,7 +20,12 @@
specified. Similarly, running "pcfre2test -C bsr" never produced the result
ANY.
+4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
+multi-code-unit characters caused bad behaviour and possibly a crash. This
+issue was fixed for other kinds of repeat in release 10.20 by change 19, but
+repeating character classes were overlooked.
+
Version 10.31 12-February-2018
------------------------------
Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c 2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/src/pcre2_match.c 2018-02-19 17:26:33 UTC (rev 918)
@@ -1962,11 +1962,15 @@
if (reptype == REPTYPE_POS) continue; /* No backtracking */
+ /* After \C in UTF mode, Lstart_eptr might be in the middle of a
+ Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't
+ go too far. */
+
for (;;)
{
RMATCH(Fecode, RM201);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
- if (Feptr-- == Lstart_eptr) break; /* Tried at original position */
+ if (Feptr-- <= Lstart_eptr) break; /* Tried at original position */
BACKCHAR(Feptr);
}
}
@@ -2126,11 +2130,15 @@
if (reptype == REPTYPE_POS) continue; /* No backtracking */
+ /* After \C in UTF mode, Lstart_eptr might be in the middle of a
+ Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't
+ go too far. */
+
for(;;)
{
RMATCH(Fecode, RM101);
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
- if (Feptr-- == Lstart_eptr) break; /* Tried at original position */
+ if (Feptr-- <= Lstart_eptr) break; /* Tried at original position */
#ifdef SUPPORT_UNICODE
if (utf) BACKCHAR(Feptr);
#endif
@@ -4002,8 +4010,8 @@
if (reptype == REPTYPE_POS) continue; /* No backtracking */
/* After \C in UTF mode, Lstart_eptr might be in the middle of a
- Unicode character. Use <= pp to ensure backtracking doesn't go too far.
- */
+ Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't
+ go too far. */
for(;;)
{
Modified: code/trunk/testdata/testinput22
===================================================================
--- code/trunk/testdata/testinput22 2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/testdata/testinput22 2018-02-19 17:26:33 UTC (rev 918)
@@ -99,4 +99,7 @@
\= Expect no match - tests \C at end of subject
ab
+/\C[^\v]+\x80/utf
+ [AΏBŀC]
+
# End of testinput22
Modified: code/trunk/testdata/testoutput22-16
===================================================================
--- code/trunk/testdata/testoutput22-16 2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/testdata/testoutput22-16 2018-02-19 17:26:33 UTC (rev 918)
@@ -172,4 +172,8 @@
ab
No match
+/\C[^\v]+\x80/utf
+ [AΏBŀC]
+No match
+
# End of testinput22
Modified: code/trunk/testdata/testoutput22-32
===================================================================
--- code/trunk/testdata/testoutput22-32 2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/testdata/testoutput22-32 2018-02-19 17:26:33 UTC (rev 918)
@@ -170,4 +170,8 @@
ab
No match
+/\C[^\v]+\x80/utf
+ [AΏBŀC]
+No match
+
# End of testinput22
Modified: code/trunk/testdata/testoutput22-8
===================================================================
--- code/trunk/testdata/testoutput22-8 2018-02-19 17:00:45 UTC (rev 917)
+++ code/trunk/testdata/testoutput22-8 2018-02-19 17:26:33 UTC (rev 918)
@@ -174,4 +174,8 @@
ab
No match
+/\C[^\v]+\x80/utf
+ [AΏBŀC]
+No match
+
# End of testinput22