------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=933
Summary: Multibyte symbols in bracket expressions are treated as
separate 1-byte symbols
Product: PCRE
Version: N/A
Platform: Other
OS/Version: Linux
Status: NEW
Severity: bug
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: vopros@???
CC: pcre-dev@???
On UTF-8 locales bracket expressions with non-ASCII characters are matched as
if those were single-byte characters.
For example '[бв]' which is \xd0\xb1\xd0\xb2 is treated as any of the symbols
\xd0, \xb1 or \xb2 rather than any either of the sequences \xd0\xb1 or
\xd0\xb2.
Try running “pcregrep -o '[бв]' random-symbols.txt” on the attached file.
Observed on libpcre versions 7.9 and 8.00, Gentoo Linux on AMD64.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email