Revision: 319
http://www.exim.org/viewvc/pcre2?view=rev&revision=319
Author: ph10
Date: 2015-07-20 11:17:23 +0100 (Mon, 20 Jul 2015)
Log Message:
-----------
Fix another fuzzer bug.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/src/pcre2_compile.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2015-07-20 07:38:06 UTC (rev 318)
+++ code/trunk/ChangeLog 2015-07-20 10:17:23 UTC (rev 319)
@@ -53,7 +53,13 @@
14. Fix infinite recursion in the JIT compiler when certain patterns such as
/(?:|a|){100}x/ are analysed.
+15. Some patterns with character classes involving [: and \\ were incorrectly
+compiled and could cause reading from uninitialized memory or an incorrect
+error diagnosis. Examples are: /[[:\\](?<[::]/ and /[[:\\](?'abc')[a:]. The
+first of these bugs was discovered by Karl Skomski with the LLVM fuzzer.
+
+
Version 10.20 30-June-2015
--------------------------
Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c 2015-07-20 07:38:06 UTC (rev 318)
+++ code/trunk/src/pcre2_compile.c 2015-07-20 10:17:23 UTC (rev 319)
@@ -2574,11 +2574,11 @@
The problem in trying to be exactly like Perl is in the handling of escapes. We
have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX
class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code
-below handles the special case of \], but does not try to do any other escape
-processing. This makes it different from Perl for cases such as [:l\ower:]
-where Perl recognizes it as the POSIX class "lower" but PCRE does not recognize
-"l\ower". This is a lesser evil than not diagnosing bad classes when Perl does,
-I think.
+below handles the special cases \\ and \], but does not try to do any other
+escape processing. This makes it different from Perl for cases such as
+[:l\ower:] where Perl recognizes it as the POSIX class "lower" but PCRE does
+not recognize "l\ower". This is a lesser evil than not diagnosing bad classes
+when Perl does, I think.
A user pointed out that PCRE was rejecting [:a[:digit:]] whereas Perl was not.
It seems that the appearance of a nested POSIX class supersedes an apparent
@@ -2606,7 +2606,9 @@
for (++ptr; *ptr != CHAR_NULL; ptr++)
{
- if (*ptr == CHAR_BACKSLASH && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET) ptr++;
+ if (*ptr == CHAR_BACKSLASH &&
+ (ptr[1] == CHAR_RIGHT_SQUARE_BRACKET || ptr[1] == CHAR_BACKSLASH))
+ ptr++;
else if (*ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE;
else
{
@@ -3010,16 +3012,16 @@
for (; ptr < cb->end_pattern; ptr++)
{
c = *ptr;
-
- /* Parenthesized groups set skiptoket when all following characters up to the
- next closing parenthesis must be ignored. The parenthesis itself must be
- processed (to end the nested parenthesized item). */
-
+
+ /* Parenthesized groups set skiptoket when all following characters up to the
+ next closing parenthesis must be ignored. The parenthesis itself must be
+ processed (to end the nested parenthesized item). */
+
if (skiptoket)
{
if (c != CHAR_RIGHT_PARENTHESIS) continue;
skiptoket = FALSE;
- }
+ }
/* Skip over literals */
@@ -3117,6 +3119,8 @@
for (;;)
{
+ PCRE2_SPTR tempptr;
+
if (c == CHAR_NULL && ptr >= cb->end_pattern)
{
errorcode = ERR6; /* Missing terminating ']' */
@@ -3143,12 +3147,11 @@
}
/* Skip POSIX class names. */
-
if (c == CHAR_LEFT_SQUARE_BRACKET &&
(ptr[1] == CHAR_COLON || ptr[1] == CHAR_DOT ||
- ptr[1] == CHAR_EQUALS_SIGN) && check_posix_syntax(ptr, &ptr))
+ ptr[1] == CHAR_EQUALS_SIGN) && check_posix_syntax(ptr, &tempptr))
{
- ptr++;
+ ptr = tempptr + 1;
}
else if (c == CHAR_BACKSLASH)
{
@@ -3189,13 +3192,13 @@
default:
ptr += 2;
if (ptr[0] == CHAR_R || /* (?R) */
- ptr[0] == CHAR_NUMBER_SIGN || /* (?#) */
+ ptr[0] == CHAR_NUMBER_SIGN || /* (?#) */
IS_DIGIT(ptr[0]) || /* (?n) */
(ptr[0] == CHAR_MINUS && IS_DIGIT(ptr[1]))) /* (?-n) */
{
skiptoket = TRUE;
break;
- }
+ }
/* Handle (?| and (?imsxJU: which are the only other valid forms. Both
need a new block on the nest stack. */
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2015-07-20 07:38:06 UTC (rev 318)
+++ code/trunk/testdata/testinput2 2015-07-20 10:17:23 UTC (rev 319)
@@ -4346,4 +4346,8 @@
/((?x)(?#))#(?'abc')/I
+/[[:\\](?<[::]/
+
+/[[:\\](?'abc')[a:]/I
+
# End of testinput2
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2015-07-20 07:38:06 UTC (rev 318)
+++ code/trunk/testdata/testoutput2 2015-07-20 10:17:23 UTC (rev 319)
@@ -14524,4 +14524,14 @@
First code unit = '#'
Subject length lower bound = 1
+/[[:\\](?<[::]/
+Failed: error 124 at offset 9: unrecognized character after (?<
+
+/[[:\\](?'abc')[a:]/I
+Capturing subpattern count = 1
+Named capturing subpatterns:
+ abc 1
+Starting code units: : [ \
+Subject length lower bound = 2
+
# End of testinput2