https://bugs.exim.org/show_bug.cgi?id=2108
Petr Pisar <ppisar@???> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ppisar@???
--- Comment #1 from Petr Pisar <ppisar@???> ---
Content of the file is:
00000000 2f 5c 43 5b 5e 30 30 5d 2a 5e 74 2f 75 74 66 0a |/\C[^00]*^t/utf.|
00000010 f3 a4 8c 97 74 0a |....t.|
00000016
The expression starts with \C and the subject string contains characters coded
in more than one code unit (the first four bytes, U+000e4317). pcre2pattern(3)
reads:
Because \C breaks up characters into individual code units, matching
one unit with \C in UTF-8 or UTF-16 mode means that the rest of the
string may start with a malformed UTF character. This has undefined
results, because PCRE2 assumes that it is matching character by characâ
ter in a valid UTF string (by default it checks the subject string's
validity at the start of processing unless the PCRE2_NO_UTF_CHECK
option is used).
I believe you hit documented undefined behavior.
--
You are receiving this mail because:
You are on the CC list for the bug.