[pcre-dev] [Bug 1049] Add support for UTF-16

Top Page
Delete this message
Author: Zoltan Herczeg
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 1049] New: Add support for UTF-16
Subject: [pcre-dev] [Bug 1049] Add support for UTF-16
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1049




--- Comment #13 from Zoltan Herczeg <hzmester@???> 2011-11-14 11:48:56 ---
> What do people think?


Well, I proposed exactly the same thing above. I am really happy that we
concluded with the same solution. I think char/UTF8 and char16/UTF16 have the
same relation, so both libraries should allow selecting between them.

- We should choose between the two modes in configure time. Configure can be
called from a different build directory so creating both libraries from the
same source is no trouble (I usually do this because I don't like mixing source
and object files anyway).

- pcre.h: We should provide a pcre16.h for the 16 bit library. Both headers
shouldn't be included in the same time (although different C files can include
different ones). Perhaps this would not cause any trouble but who knows.

- LINK_SIZE 3 would be the same as LINK_SIZE 4 in 16 bit mode.

- PCRE_UTF8 would be replaced to PCRE_UTF16 in pcre16.h. (No need to allocate a
new flag)

- utf16 would use slightly more memory for patterns compared to utf8, but I
don't think the targeted audience would worry about this.

- if UCP is disabled, we can only set the properties for the first 256 chars
even in char16 mode.

I have a solution for the reload ability as well: we could provide a conversion
function between different endianness (although this would be a low priority
task at the moment). Currently it would be enough to mark the endianness with a
flag and pcre_exec(...) would simply return with an error in case of a bad
endianness.

By the way, does anyone know a clever way of compile time endianness check?

Philip I would like to help you in this work.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email