Revision: 447
http://www.exim.org/viewvc/pcre2?view=rev&revision=447
Author: ph10
Date: 2015-11-27 15:58:44 +0000 (Fri, 27 Nov 2015)
Log Message:
-----------
Fix bad offset value in invalid UTF pattern error.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/src/pcre2_compile.c
code/trunk/src/pcre2_error.c
code/trunk/testdata/testinput10
code/trunk/testdata/testoutput10
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2015-11-25 18:46:35 UTC (rev 446)
+++ code/trunk/ChangeLog 2015-11-27 15:58:44 UTC (rev 447)
@@ -332,9 +332,12 @@
99. If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between
an item and its qualifier (for example, A(?#comment)?B) pcre2_compile()
-misbehaved.
+misbehaved. This bug was found by the LLVM fuzzer.
+100. The error for an invalid UTF pattern string always gave the code unit
+offset as zero instead of where the invalidity was found.
+
Version 10.20 30-June-2015
--------------------------
Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c 2015-11-25 18:46:35 UTC (rev 446)
+++ code/trunk/src/pcre2_compile.c 2015-11-27 15:58:44 UTC (rev 447)
@@ -8468,7 +8468,7 @@
}
if ((options & PCRE2_NO_UTF_CHECK) == 0 &&
(errorcode = PRIV(valid_utf)(pattern, patlen, erroroffset)) != 0)
- goto HAD_ERROR;
+ goto HAD_UTF_ERROR;
}
/* Check UCP lockout. */
@@ -8849,10 +8849,11 @@
if (errorcode != 0)
{
HAD_ERROR:
+ *erroroffset = (int)(ptr - pattern);
+ HAD_UTF_ERROR:
+ *errorptr = errorcode;
pcre2_code_free(re);
re = NULL;
- *errorptr = errorcode;
- *erroroffset = (int)(ptr - pattern);
goto EXIT;
}
Modified: code/trunk/src/pcre2_error.c
===================================================================
--- code/trunk/src/pcre2_error.c 2015-11-25 18:46:35 UTC (rev 446)
+++ code/trunk/src/pcre2_error.c 2015-11-27 15:58:44 UTC (rev 447)
@@ -204,7 +204,7 @@
/* 20 */
"UTF-8 error: overlong 5-byte sequence\0"
"UTF-8 error: overlong 6-byte sequence\0"
- "UTF-8 error: isolated 0x80 byte\0"
+ "UTF-8 error: isolated byte with 0x80 bit set\0"
"UTF-8 error: illegal byte (0xfe or 0xff)\0"
"UTF-16 error: missing low surrogate at end\0"
/* 25 */
Modified: code/trunk/testdata/testinput10
===================================================================
--- code/trunk/testdata/testinput10 2015-11-25 18:46:35 UTC (rev 446)
+++ code/trunk/testdata/testinput10 2015-11-27 15:58:44 UTC (rev 447)
@@ -1,7 +1,7 @@
# This set of tests is for UTF-8 support and Unicode property support, with
# relevance only for the 8-bit library.
-# The next 3 patterns have UTF-8 errors
+# The next 4 patterns have UTF-8 errors
/[\xC3]/utf
@@ -9,6 +9,8 @@
/\xC3\xC3\xC3xxx/utf
+/Â\x82\x82\x82\x82\x82\x82\x82\xC3/utf
+
# Now test subjects
/badutf/utf
Modified: code/trunk/testdata/testoutput10
===================================================================
--- code/trunk/testdata/testoutput10 2015-11-25 18:46:35 UTC (rev 446)
+++ code/trunk/testdata/testoutput10 2015-11-27 15:58:44 UTC (rev 447)
@@ -1,10 +1,10 @@
# This set of tests is for UTF-8 support and Unicode property support, with
# relevance only for the 8-bit library.
-# The next 3 patterns have UTF-8 errors
+# The next 4 patterns have UTF-8 errors
/[\xC3]/utf
-Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80
+Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80
/\xC3/utf
Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end
@@ -12,6 +12,9 @@
/\xC3\xC3\xC3xxx/utf
Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80
+/Â\x82\x82\x82\x82\x82\x82\x82\xC3/utf
+Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set
+
# Now test subjects
/badutf/utf
@@ -89,7 +92,7 @@
\xfc\x80\x80\x80\x80\x8f
Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0
\x80
-Failed: error -22: UTF-8 error: isolated 0x80 byte at offset 0
+Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 0
\xfe
Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0
\xff
@@ -1534,6 +1537,6 @@
First code unit = 'x'
Subject length lower bound = 1
a\x80zx\=offset=3
-Failed: error -22: UTF-8 error: isolated 0x80 byte at offset 1
+Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 1
# End of testinput10