[pcre-dev] [Bug 933] New: Multibyte symbols in bracket expre…

Top Page
Delete this message
Author: Makar
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 933] New: Multibyte symbols in bracket expressions are treated as separate 1-byte symbols
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=933
           Summary: Multibyte symbols in bracket expressions are treated as
                    separate 1-byte symbols
           Product: PCRE
           Version: N/A
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: vopros@???
                CC: pcre-dev@???



On UTF-8 locales bracket expressions with non-ASCII characters are matched as
if those were single-byte characters.

For example '[бв]' which is \xd0\xb1\xd0\xb2 is treated as any of the symbols
\xd0, \xb1 or \xb2 rather than any either of the sequences \xd0\xb1 or
\xd0\xb2.

Try running “pcregrep -o '[бв]' random-symbols.txt” on the attached file.

Observed on libpcre versions 7.9 and 8.00, Gentoo Linux on AMD64.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email