Re: [pcre-dev] [Bug 791] New: UTF-8 support does not work on…

Top Page
Delete this message
Author: Zack Weinberg
Date:  
To: pcre-dev
CC: 791
Subject: Re: [pcre-dev] [Bug 791] New: UTF-8 support does not work on EBCDIC platforms
On Tue, Dec 16, 2008 at 1:18 PM, Philip Hazel <ph10@???> wrote:
> On Tue, 16 Dec 2008, Martin Jerabek wrote:
>
>> The real problem is now that the PCRE code compares the characters
>> taken from the pattern with normal C character literals such as '\\',
>> '*', etc. This works fine on non-EBCDIC (ASCII) platforms because the
>> 7-bit ASCII subset of UTF-8 is identical to ASCII.
>
> Exactly - I thought that was the whole point of UTF-8.


There is an encoding called UTF-EBCDIC that is probably what you
(Martin) really want to be using here (see
http://www.unicode.org/unicode/reports/tr16/) It is a modification of
UTF-8 with the property that the characters encoded as a single byte
map to a particular EBCDIC code page the same way that UTF-8
single-byte characters map to ASCII.

zw