[pcre-dev] [Bug 791] UTF-8 support does not work on EBCDIC p…

Page principale
Supprimer ce message
Auteur: Martin Jerabek
Date:  
À: pcre-dev
Sujet: [pcre-dev] [Bug 791] UTF-8 support does not work on EBCDIC platforms
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=791




--- Comment #4 from Martin Jerabek <martin.jerabek@???> 2008-12-17 13:20:19 ---
On 17.12.2008 12:03, Philip Hazel wrote:
> One other thought struck me: have you considered compiling two different
> versions of PCRE? One would be for use in the EBCDIC case, the other for
> use in the UTF-8 case.

This was my first thought because for our purposes we only need a UTF-8
version of PCRE but trying to be a good open-source citizen I intended
to strive for a general solution. If it is acceptable to you I will
modify the sources in such a way that I replace all character constants
with macros which are defined as normal literals (e.g. '*') or as UTF-8
literals (e.g. '\x2A') depending on --enable-utf8:

- If --enable-utf8 (or --enable-unicode-properties) is *not* passed to
configure, the macros evaluate to normal character literals just as they
are used now. This is correct for ASCII and EBCDIC platforms and
independent of --enable-ebcdic. On every platform, the "native" code
page is used. On EBCDIC platforms, --enable-ebcdic would still have to
be passed to configure.

- If --enable-utf8 is passed, the macros evaluate to ASCII/UTF-8 codes
such as '\x2A' for asterisk on all platforms. This works on both
non-EBCDIC platforms and on EBCDIC platforms in UTF-8 mode. If
--enable-ebcdic is also passed, a warning is issued saying that the
resulting PCRE library will *only* support UTF-8 and not EBCDIC.
Alternatively configure could return an error in this case to make sure
that the warning is not overlooked, and we could introduce a new option
like --enable-utf8-ebcdic to compile a UTF-8-only PCRE library on EBCDIC
platforms. In this case all appropriate functions would return an error
if PCRE_UTF8 is not passed to them.

> It would be easy to rename all the functions by
> defining some macros, so that you could link both versions into the same
> product.

Do you really mean to replace *all* PCRE function names with macros
which would would then be defined, e.g., with and without _ebcdic
appended? I think this is rather messy but necessary if someone wanted
to use both an EBCDIC-only and a UTF-8-only PCRE library in the same
process. I got bitten too often by violations of the One Definition Rule
to risk to have symbols with identical names in the same process because
the run-time linker will probably choose the wrong one. I would rather
avoid that and put the burden to implement this on the poor soul which
really needs this. ;-)

Many thanks for your help
Martin


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email