Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status…

Góra strony
Delete this message
Autor: Karl Williamson
Data:  
Dla: Ze'ev Atlas, Philip Hazel, Pcre Exim
CC: Ricardo SIGNES
Temat: Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status question
On 06/21/2015 09:12 AM, Karl Williamson wrote:
> On 06/20/2015 08:08 PM, Ze'ev Atlas wrote:
>> Hi Philip
>> As promised, I posed the question to the Pel-MVS community and here is
>> the answered I'v got (see below). I admit that it was news to me as
>> well, but apparently the Perl-MVS guys went the extra mile to do that.
>> I am not saying that we must follow Perl to the letter, but you may
>> want to consider implementing the below in PCRE2. I may get involved,
>> but it will take me some time (if at all), but first we need a decision
>> from you if we want even to consider that. If ultimately we decide not
>> to do it ever, we have to mention it in the documentation as a
>> difference with Perl.
>>
>> Ze'ev Atlas
>
> Perl 5.22 also introduced the concept of a Unicode range, which you
> should think about. This makes all such ranges completely portable
> between ASCII and EBCDIC platforms. /[\N{U+04}-\N{U+06}]/ match the
> characters U+0004, U+0005, and U+0006, whatever their ordinals might be
> on the platform. The 5.22 documentation has complete details. Until
> that is uploaded properly, you can see it in the source code


Also, the [A-Z] and [a-z] behavior has been the way things have worked
AFAIK since the very first EBCDIC perl.
>>
>>
>> ----- Forwarded Message -----
>> *From:* Karl Williamson <public@???>
>> *To:* "Atlas, Ze'Ev" ; "perl-mvs@???"
>> *Sent:* Friday, June 19, 2015 3:38 PM
>> *Subject:* Re: [a-z] class in EBCDIC and Perl-MVS status question
>>
>> On 06/18/2015 09:01 AM, Atlas, Ze'Ev wrote:
>> <snip - some irrelevant material>
>>
>> >
>> > 2. The Perlre in perldocs (5.20), document states:
>> >
>> > (The following all specify the same class of three characters:
>> [-az] ,
>> > [az-] , and [a\-z] . All are different from [a-z] , which specifies a
>> > class containing twenty-six characters, even on EBCDIC-based character
>> > sets.)
>> >
>> > The implication is that Perl somehow recognizes [a-z] and treats it
>> as a
>> > special case in EBCDIC and ignore the non-letters gaps. Do I
>> understand
>> > it correctly and is it implemented as advertised?
>> >
>> > Ze'ev Atlas
>>
>> Yes it is implemented as advertised. If you do want to include the gap
>> characters, you can instead write [\x81-\xA9]. But when both ends of
>> the range are literals, like "A", and the range is any subset of [A-Z]
>> or [a-z], special handling is invoked internally to exclude the gap
>>
>>
>>
>> characters.
>>
>>
>> The 5.22 EBCDIC documentation has been extensively revised by me to
>> accurately reflect the actual implementation. Please file a bug report
>> on any discrepancies. There are some known bugs in the EBCDIC version
>> not present when run on ASCII platforms. Unfortunately, the
>> documentation on the web hasn't been properly updated yet to reflect
>> 5.22. Here's what the new perlebcdic says about known EBCDIC problems:
>>
>>        *  The "cmp" (and hence "sort") operators do not necessarily
>> give the
>>            correct results when both operands are UTF-EBCDIC encoded
>> strings and
>>            there is a mixture of ASCII and/or control characters,
>> along with
>>            other characters.

>>
>>        *  Ranges containing "\N{...}" in the "tr///" (and "y///")
>>            transliteration operators are treated differently than the
>> equivalent
>>            ranges in regular expression patterns. They should, but don't,
>> cause
>>            the values in the ranges to all be treated as Unicode code
>> points, and
>>            not native ones. ("Version 8 Regular Expressions" in perlre
>> gives
>>            details as to how it should work.)

>>
>>        *  There are some bugs in the "pack"/"unpack" "U0" template

>>
>>        *  There are a significant number of test failures in the CPAN
>> modules
>>            shipped with Perl v5.22. These are only in modules not
>> primarily
>>            maintained by Perl 5 porters. Some of these are failures in
>> the tests
>>            only: they don't realize that it is proper to get different
>> results on
>>            EBCDIC platforms. And some of the failures are real bugs.
>> If you
>>            compile and do a "make test" on Perl, all tests on the "/cpan"
>>            directory are skipped.

>>
>>            In particular, the extensions Unicode::Collate and
>> Unicode::Normalize
>>            are not supported under EBCDIC; likewise for the (now
>> deprecated)
>>            encoding pragma.

>>
>>            Encode partially works.

>>
>>
>> >
>>
>>
>>
>