https://bugs.exim.org/show_bug.cgi?id=2581
Bug ID: 2581
Summary: Hangs on UTF8 string
Product: PCRE
Version: 10.35 (PCRE2)
Hardware: x86
OS: Linux
Status: NEW
Severity: security
Priority: medium
Component: Code
Assignee: ph10@???
Reporter: syzop@???
CC: pcre-dev@???
In my IRC program we deal with bytestreams that are not necessarily UTF8. So we
pass both the match string and the regex as regular "char *" unchecked if it is
UTF8 or invalid UTF8 and we used the option
PCRE2_CASELESS|PCRE2_NEVER_UTF|PCRE2_NEVER_UCP.
I was enthusiastic about the new PCRE2_MATCH_INVALID_UTF feature added in
10.34, so I implemented that in our program. Unfortunately, now that we are
using it, the daemon hangs on certain UTF8 strings.
We now compile with PCRE2_CASELESS|PCRE2_MATCH_INVALID_UTF;
pcre2_compile(str, PCRE2_ZERO_TERMINATED, options, &errorcode, &erroroffset,
NULL);
The example regex is: (^one|.*two).*
The string is: ça
>From what I understand, it should not hang or crash, but it does (the backtrace
is useless I am afraid):
"If you pass an invalid UTF string when PCRE2_NO_UTF_CHECK is set, the result
is undefined and your program may crash or loop indefinitely or give incorrect
results. There is, however, one mode of matching that can handle invalid UTF
subject strings. This is enabled by passing PCRE2_MATCH_INVALID_UTF to
pcre2_compile() and is discussed below in the next section."
Actually, it isn't even an invalid UTF8 string, but a valid c with cedilla.
I can reproduce on both 10.34 and 10.35. These are the exact configure options
we use (we use JIT): ./configure --enable-jit --enable-shared
For now I can only reproduce the issue with () groups, which is probably why we
missed it during QA with other regexes.
--
You are receiving this mail because:
You are on the CC list for the bug.