[pcre-dev] [Bug 1049] Add support for UTF-16

Top Page
Delete this message
Author: Thorsten Schöning
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 1049] New: Add support for UTF-16
Subject: [pcre-dev] [Bug 1049] Add support for UTF-16
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1049

Thorsten Schöning <tschoening@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tschoening@???





--- Comment #17 from Thorsten Schöning <tschoening@???> 2011-11-14 13:51:20 ---
(In reply to comment #15)
> On Mon, 14 Nov 2011, Zoltan Herczeg wrote:
>
> I would like to allocate a new flag so that we can detect a 16-bit
> pattern that is erroneously passed to an 8-bit matcher. (I just have the
> feeling that somebody is sure to come up with an application that
> handles both sizes.)


If I understood you correctly, I'm the one handling both sizes. :-) In one of
my applications we worked with standard char in windows-1252 codepage and
needed to support Unicode in some places and decided to use what Windows
supports per default, which is wchar_t as 16 Bit datatype with UTF-16 encoding.
This resulted in classes which work with PCRE on windows-1252 encoding and
UTF-8-encoding the same time. If PCRE supports UTF-16 or at least 16 Bit wide
strings natively in future versions, I would considering using those instead of
converting our strings to UTF-8 before using PCRE. But of course I could only
do this for new code and would really appreciate if all modes could be used in
the same class.

Just as a hint: In January this year I asked for supporting std::wstring in
pcrecpp. Is this something to consider now again? The topic was "implementing
support for std::wstring in pcrecpp".


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email