[pcre-dev] [Bug 1020] New: Wrong pcre_exec() with some UTF8 …

Góra strony
Delete this message
Autor: Dmitry Ukolov
Data:  
Dla: pcre-dev
Nowe tematy: [pcre-dev] [Bug 1020] Wrong pcre_exec() with some UTF8 chars in the pattern, [pcre-dev] [Bug 1020] Wrong pcre_exec() with some UTF8 chars in the pattern
Temat: [pcre-dev] [Bug 1020] New: Wrong pcre_exec() with some UTF8 chars in the pattern
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1020
           Summary: Wrong pcre_exec() with some UTF8 chars in the pattern
           Product: PCRE
           Version: 8.10
          Platform: x86-64
        OS/Version: Windows
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: udmitry@???
                CC: pcre-dev@???



Hi, Philip,

I have wrong pcre_exec results in UTF8 with NEWLINE_ANY and EXTENDED.

Pattern has 6 byte (4 UTF8 chars) - one char and 3 remark:
p = 'A#\xd1\x85\xd1\x86';

Line has 3 chars (english):
L := 'BAB';

Test code is (Pascal-based):
/************************/
re := pcre_compile(P, PCRE_EXTENDED or PCRE_UTF8 or PCRE_NEWLINE_ANY, @ePtr,
@eo, nil);
if re <> nil then
Cnt := pcre_exec(re, nil, TestLine, Length(TestLine), 0, PCRE_NOTEMPTY,
@(Vector[0]), Length(Vector));
/************************/

In this case Cnt == -1

if I change pattern symbol \xd1\x86 to another symbol, it works fine. For
example:
p = 'A#\xd1\x86\xd1\x86';
...
Cnt == 1 !!!

If i remove PCRE_NEWLINE_ANY option from pcre_compile(), it works too.

Thanks,
Dmitry Ukolov.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email