[pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status question

Autor: Ze'ev Atlas
Data:
Dla: Philip Hazel, Pcre Exim
CC: Karl Williamson
Temat: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status question

Hi PhilipAs promised, I posed the question to the Pel-MVS community and here is the answered I'v got (see below). I admit that it was news to me as well, but apparently the Perl-MVS guys went the extra mile to do that. I am not saying that we must follow Perl to the letter, but you may want to consider implementing the below in PCRE2. I may get involved, but it will take me some time (if at all), but first we need a decision from you if we want even to consider that. If ultimately we decide not to do it ever, we have to mention it in the documentation as a difference with Perl.
Ze'ev Atlas

    ----- Forwarded Message -----
  From: Karl Williamson <public@???>
 To: "Atlas, Ze'Ev" ; "perl-mvs@???" 
 Sent: Friday, June 19, 2015 3:38 PM
 Subject: Re: [a-z] class in EBCDIC and Perl-MVS status question

On 06/18/2015 09:01 AM, Atlas, Ze'Ev wrote:
<snip - some irrelevant material>

>
> 2. The Perlre in perldocs (5.20), document states:
>
> (The following all specify the same class of three characters: [-az] ,
> [az-] , and [a\-z] . All are different from [a-z] , which specifies a
> class containing twenty-six characters, even on EBCDIC-based character
> sets.)
>
> The implication is that Perl somehow recognizes [a-z] and treats it as a
> special case in EBCDIC and ignore the non-letters gaps. Do I understand
> it correctly and is it implemented as advertised?
>
> Ze'ev Atlas

Yes it is implemented as advertised. If you do want to include the gap
characters, you can instead write [\x81-\xA9]. But when both ends of
the range are literals, like "A", and the range is any subset of [A-Z]
or [a-z], special handling is invoked internally to exclude the gap

characters.

The 5.22 EBCDIC documentation has been extensively revised by me to
accurately reflect the actual implementation. Please file a bug report
on any discrepancies. There are some known bugs in the EBCDIC version
not present when run on ASCII platforms. Unfortunately, the
documentation on the web hasn't been properly updated yet to reflect
5.22. Here's what the new perlebcdic says about known EBCDIC problems:

* The "cmp" (and hence "sort") operators do not necessarily give the
correct results when both operands are UTF-EBCDIC encoded
strings and
there is a mixture of ASCII and/or control characters, along with
other characters.

* Ranges containing "\N{...}" in the "tr///" (and "y///")
transliteration operators are treated differently than the
equivalent
ranges in regular expression patterns. They should, but don't,
cause
the values in the ranges to all be treated as Unicode code
points, and
not native ones. ("Version 8 Regular Expressions" in perlre gives
details as to how it should work.)

* There are some bugs in the "pack"/"unpack" "U0" template

* There are a significant number of test failures in the CPAN
modules
shipped with Perl v5.22. These are only in modules not primarily
maintained by Perl 5 porters. Some of these are failures in
the tests
only: they don't realize that it is proper to get different
results on
EBCDIC platforms. And some of the failures are real bugs. If you
compile and do a "make test" on Perl, all tests on the "/cpan"
directory are skipped.

In particular, the extensions Unicode::Collate and
Unicode::Normalize
are not supported under EBCDIC; likewise for the (now deprecated)
encoding pragma.

Encode partially works.

>

[pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status que…