[pcre-dev] [Bug 1049] Add support for UTF-16

Top Page
Delete this message
Author: Zoltan Herczeg
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 1049] New: Add support for UTF-16
Subject: [pcre-dev] [Bug 1049] Add support for UTF-16
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1049




--- Comment #7 from Zoltan Herczeg <hzmester@???> 2011-11-11 08:00:03 ---
I am thinking about a different approach.

Would it be possible to compile the PCRE library with two different modes?

Since we should support char/utf8 and wchar/utf16 modes, I think compiling two
libraries might solve the problem. The first would be the same as before, the
second would be the wchar based library, where the library, the public
functions and pcre.h would get a postfix string like _16, w, W or something,
and PCRE_UTF8 would be changed to PCRE_UTF16.

With _16 postfix:

You need to include "pcre_16.h", add link to "libpcre_16.so" and use
pcre_compile2_16(...), pcre_exec_16(...) which all expects wchar_t string. The
allowed LINK_SIZE would only be 2 or 4 in wchar_t mode.

You would need to select between char and wchar modes in configure time.
Desktop machines would support both libraries, embedded systems could choose
between them if needed.

Key advantages:
- less code modifications especially in the compile and execute part.
- compatiblity (the original library would remain the same)

Disadvantages:
- a single application should not include pcre.h and pcre_16.h in the same
time (I don't want to add _16 to all defines). Although it could include
different headers in different .c files and could link to both libraries if
needed.
- extra work for library maintainers (although I am sure distros have a
clever way to do it)


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email