[pcre-dev] [Bug 633] New: pcre_get_substring*: Problems with…

Top Page
Delete this message
Author: Olaf Walkowiak
Date:  
To: pcre-dev
New-Topics: [pcre-dev] [Bug 633] pcre_get_substring*: Problems with UTF8
Subject: [pcre-dev] [Bug 633] New: pcre_get_substring*: Problems with UTF8
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=633
           Summary: pcre_get_substring*: Problems with UTF8
           Product: PCRE
           Version: N/A
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: olaf@???
                CC: pcre-dev@???



The pcre_get_substring family of functions has Problems if the "haystack" ist
UTF8. The ovector from pcreRegexExecute are treated as single byte, so if
"haystack" contains UTF8 Characters, the result is truncated.

Extracting the strings with xmlUTF8Strsub (from libxml) it works as expected


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email