[pcre-dev] [Bug 2581] New: Hangs on UTF8 string

Top Page

Reply to this message
Author: admin
To: pcre-dev
Subject: [pcre-dev] [Bug 2581] New: Hangs on UTF8 string

            Bug ID: 2581
           Summary: Hangs on UTF8 string
           Product: PCRE
           Version: 10.35 (PCRE2)
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: security
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: syzop@???
                CC: pcre-dev@???

In my IRC program we deal with bytestreams that are not necessarily UTF8. So we
pass both the match string and the regex as regular "char *" unchecked if it is
UTF8 or invalid UTF8 and we used the option

I was enthusiastic about the new PCRE2_MATCH_INVALID_UTF feature added in
10.34, so I implemented that in our program. Unfortunately, now that we are
using it, the daemon hangs on certain UTF8 strings.

pcre2_compile(str, PCRE2_ZERO_TERMINATED, options, &errorcode, &erroroffset,

The example regex is: (^one|.*two).*
The string is: ça

>From what I understand, it should not hang or crash, but it does (the backtrace

is useless I am afraid):
"If you pass an invalid UTF string when PCRE2_NO_UTF_CHECK is set, the result
is undefined and your program may crash or loop indefinitely or give incorrect
results. There is, however, one mode of matching that can handle invalid UTF
subject strings. This is enabled by passing PCRE2_MATCH_INVALID_UTF to
pcre2_compile() and is discussed below in the next section."

Actually, it isn't even an invalid UTF8 string, but a valid c with cedilla.

I can reproduce on both 10.34 and 10.35. These are the exact configure options
we use (we use JIT): ./configure --enable-jit --enable-shared

For now I can only reproduce the issue with () groups, which is probably why we
missed it during QA with other regexes.

You are receiving this mail because:
You are on the CC list for the bug.