Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status question

Author: Karl Williamson
Date:
To: Ze'ev Atlas, Philip Hazel, Pcre Exim
CC: Ricardo SIGNES
Subject: Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status question

On 06/21/2015 09:12 AM, Karl Williamson wrote:
> On 06/20/2015 08:08 PM, Ze'ev Atlas wrote:
>> Hi Philip
>> As promised, I posed the question to the Pel-MVS community and here is
>> the answered I'v got (see below). I admit that it was news to me as
>> well, but apparently the Perl-MVS guys went the extra mile to do that.
>> I am not saying that we must follow Perl to the letter, but you may
>> want to consider implementing the below in PCRE2. I may get involved,
>> but it will take me some time (if at all), but first we need a decision
>> from you if we want even to consider that. If ultimately we decide not
>> to do it ever, we have to mention it in the documentation as a
>> difference with Perl.
>>
>> Ze'ev Atlas
>
> Perl 5.22 also introduced the concept of a Unicode range, which you
> should think about. This makes all such ranges completely portable
> between ASCII and EBCDIC platforms. /[\N{U+04}-\N{U+06}]/ match the
> characters U+0004, U+0005, and U+0006, whatever their ordinals might be
> on the platform. The 5.22 documentation has complete details. Until
> that is uploaded properly, you can see it in the source code

Also, the [A-Z] and [a-z] behavior has been the way things have worked
AFAIK since the very first EBCDIC perl.
>>
>>
>> ----- Forwarded Message -----
>> *From:* Karl Williamson <public@???>
>> *To:* "Atlas, Ze'Ev" ; "perl-mvs@???"
>> *Sent:* Friday, June 19, 2015 3:38 PM
>> *Subject:* Re: [a-z] class in EBCDIC and Perl-MVS status question
>>
>> On 06/18/2015 09:01 AM, Atlas, Ze'Ev wrote:
>> <snip - some irrelevant material>
>>
>> >
>> > 2. The Perlre in perldocs (5.20), document states:
>> >
>> > (The following all specify the same class of three characters:
>> [-az] ,
>> > [az-] , and [a\-z] . All are different from [a-z] , which specifies a
>> > class containing twenty-six characters, even on EBCDIC-based character
>> > sets.)
>> >
>> > The implication is that Perl somehow recognizes [a-z] and treats it
>> as a
>> > special case in EBCDIC and ignore the non-letters gaps. Do I
>> understand
>> > it correctly and is it implemented as advertised?
>> >
>> > Ze'ev Atlas
>>
>> Yes it is implemented as advertised. If you do want to include the gap
>> characters, you can instead write [\x81-\xA9]. But when both ends of
>> the range are literals, like "A", and the range is any subset of [A-Z]
>> or [a-z], special handling is invoked internally to exclude the gap
>>
>>
>>
>> characters.
>>
>>
>> The 5.22 EBCDIC documentation has been extensively revised by me to
>> accurately reflect the actual implementation. Please file a bug report
>> on any discrepancies. There are some known bugs in the EBCDIC version
>> not present when run on ASCII platforms. Unfortunately, the
>> documentation on the web hasn't been properly updated yet to reflect
>> 5.22. Here's what the new perlebcdic says about known EBCDIC problems:
>>
>> * The "cmp" (and hence "sort") operators do not necessarily >> give the >> correct results when both operands are UTF-EBCDIC encoded >> strings and >> there is a mixture of ASCII and/or control characters, >> along with >> other characters.

>>
>> * Ranges containing "\N{...}" in the "tr///" (and "y///") >> transliteration operators are treated differently than the >> equivalent >> ranges in regular expression patterns. They should, but don't, >> cause >> the values in the ranges to all be treated as Unicode code >> points, and >> not native ones. ("Version 8 Regular Expressions" in perlre >> gives >> details as to how it should work.)

>>
>> * There are some bugs in the "pack"/"unpack" "U0" template

>>
>> * There are a significant number of test failures in the CPAN >> modules >> shipped with Perl v5.22. These are only in modules not >> primarily >> maintained by Perl 5 porters. Some of these are failures in >> the tests >> only: they don't realize that it is proper to get different >> results on >> EBCDIC platforms. And some of the failures are real bugs. >> If you >> compile and do a "make test" on Perl, all tests on the "/cpan" >> directory are skipped.

>>
>> In particular, the extensions Unicode::Collate and >> Unicode::Normalize >> are not supported under EBCDIC; likewise for the (now >> deprecated) >> encoding pragma.

>>
>> Encode partially works.

>>
>>
>> >
>>
>>
>>
>

This message is part of the following thread:
	the complete thread tree sorted by date
	Karl Williamson at
	ph10 at

Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status…