Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status question

Auteur: Karl Williamson
Datum:
Aan: Ze'ev Atlas, Philip Hazel, Pcre Exim
CC: Ricardo SIGNES
Onderwerp: Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status question

On 06/20/2015 08:08 PM, Ze'ev Atlas wrote:
> Hi Philip
> As promised, I posed the question to the Pel-MVS community and here is
> the answered I'v got (see below). I admit that it was news to me as
> well, but apparently the Perl-MVS guys went the extra mile to do that.
> I am not saying that we must follow Perl to the letter, but you may
> want to consider implementing the below in PCRE2. I may get involved,
> but it will take me some time (if at all), but first we need a decision
> from you if we want even to consider that. If ultimately we decide not
> to do it ever, we have to mention it in the documentation as a
> difference with Perl.
>
> Ze'ev Atlas

Perl 5.22 also introduced the concept of a Unicode range, which you
should think about. This makes all such ranges completely portable
between ASCII and EBCDIC platforms. /[\N{U+04}-\N{U+06}]/ match the
characters U+0004, U+0005, and U+0006, whatever their ordinals might be
on the platform. The 5.22 documentation has complete details. Until
that is uploaded properly, you can see it in the source code
>
>
> ----- Forwarded Message -----
> *From:* Karl Williamson <public@???>
> *To:* "Atlas, Ze'Ev" ; "perl-mvs@???"
> *Sent:* Friday, June 19, 2015 3:38 PM
> *Subject:* Re: [a-z] class in EBCDIC and Perl-MVS status question
>
> On 06/18/2015 09:01 AM, Atlas, Ze'Ev wrote:
> <snip - some irrelevant material>
>
> >
> > 2. The Perlre in perldocs (5.20), document states:
> >
> > (The following all specify the same class of three characters: [-az] ,
> > [az-] , and [a\-z] . All are different from [a-z] , which specifies a
> > class containing twenty-six characters, even on EBCDIC-based character
> > sets.)
> >
> > The implication is that Perl somehow recognizes [a-z] and treats it as a
> > special case in EBCDIC and ignore the non-letters gaps. Do I understand
> > it correctly and is it implemented as advertised?
> >
> > Ze'ev Atlas
>
> Yes it is implemented as advertised. If you do want to include the gap
> characters, you can instead write [\x81-\xA9]. But when both ends of
> the range are literals, like "A", and the range is any subset of [A-Z]
> or [a-z], special handling is invoked internally to exclude the gap
>
>
>
> characters.
>
>
> The 5.22 EBCDIC documentation has been extensively revised by me to
> accurately reflect the actual implementation. Please file a bug report
> on any discrepancies. There are some known bugs in the EBCDIC version
> not present when run on ASCII platforms. Unfortunately, the
> documentation on the web hasn't been properly updated yet to reflect
> 5.22. Here's what the new perlebcdic says about known EBCDIC problems:
>
> * The "cmp" (and hence "sort") operators do not necessarily give the > correct results when both operands are UTF-EBCDIC encoded > strings and > there is a mixture of ASCII and/or control characters, along with > other characters.

>
> * Ranges containing "\N{...}" in the "tr///" (and "y///") > transliteration operators are treated differently than the > equivalent > ranges in regular expression patterns. They should, but don't, > cause > the values in the ranges to all be treated as Unicode code > points, and > not native ones. ("Version 8 Regular Expressions" in perlre gives > details as to how it should work.)

>
> * There are some bugs in the "pack"/"unpack" "U0" template

>
> * There are a significant number of test failures in the CPAN > modules > shipped with Perl v5.22. These are only in modules not primarily > maintained by Perl 5 porters. Some of these are failures in > the tests > only: they don't realize that it is proper to get different > results on > EBCDIC platforms. And some of the failures are real bugs. If you > compile and do a "make test" on Perl, all tests on the "/cpan" > directory are skipped.

>
> In particular, the extensions Unicode::Collate and > Unicode::Normalize > are not supported under EBCDIC; likewise for the (now deprecated) > encoding pragma.

>
> Encode partially works.

>
>
> >
>
>
>

Deze boodschap maakt deel uit van devolgende draad:
	de volledige draad-boom gesorteerd op datum
	Ze'ev Atlas op
	Karl Williamson op

Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status…