Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status…

Αρχική Σελίδα
Delete this message
Συντάκτης: Karl Williamson
Ημερομηνία:  
Προς: Ze'ev Atlas, Philip Hazel, Pcre Exim
Υ/ο: Ricardo SIGNES
Αντικείμενο: Re: [pcre-dev] Fw: [a-z] class in EBCDIC and Perl-MVS status question
On 06/20/2015 08:08 PM, Ze'ev Atlas wrote:
> Hi Philip
> As promised, I posed the question to the Pel-MVS community and here is
> the answered I'v got (see below). I admit that it was news to me as
> well, but apparently the Perl-MVS guys went the extra mile to do that.
> I am not saying that we must follow Perl to the letter, but you may
> want to consider implementing the below in PCRE2. I may get involved,
> but it will take me some time (if at all), but first we need a decision
> from you if we want even to consider that. If ultimately we decide not
> to do it ever, we have to mention it in the documentation as a
> difference with Perl.
>
> Ze'ev Atlas


Perl 5.22 also introduced the concept of a Unicode range, which you
should think about. This makes all such ranges completely portable
between ASCII and EBCDIC platforms. /[\N{U+04}-\N{U+06}]/ match the
characters U+0004, U+0005, and U+0006, whatever their ordinals might be
on the platform. The 5.22 documentation has complete details. Until
that is uploaded properly, you can see it in the source code
>
>
> ----- Forwarded Message -----
> *From:* Karl Williamson <public@???>
> *To:* "Atlas, Ze'Ev" ; "perl-mvs@???"
> *Sent:* Friday, June 19, 2015 3:38 PM
> *Subject:* Re: [a-z] class in EBCDIC and Perl-MVS status question
>
> On 06/18/2015 09:01 AM, Atlas, Ze'Ev wrote:
> <snip - some irrelevant material>
>
> >
> > 2. The Perlre in perldocs (5.20), document states:
> >
> > (The following all specify the same class of three characters: [-az] ,
> > [az-] , and [a\-z] . All are different from [a-z] , which specifies a
> > class containing twenty-six characters, even on EBCDIC-based character
> > sets.)
> >
> > The implication is that Perl somehow recognizes [a-z] and treats it as a
> > special case in EBCDIC and ignore the non-letters gaps. Do I understand
> > it correctly and is it implemented as advertised?
> >
> > Ze'ev Atlas
>
> Yes it is implemented as advertised. If you do want to include the gap
> characters, you can instead write [\x81-\xA9]. But when both ends of
> the range are literals, like "A", and the range is any subset of [A-Z]
> or [a-z], special handling is invoked internally to exclude the gap
>
>
>
> characters.
>
>
> The 5.22 EBCDIC documentation has been extensively revised by me to
> accurately reflect the actual implementation. Please file a bug report
> on any discrepancies. There are some known bugs in the EBCDIC version
> not present when run on ASCII platforms. Unfortunately, the
> documentation on the web hasn't been properly updated yet to reflect
> 5.22. Here's what the new perlebcdic says about known EBCDIC problems:
>
>        *  The "cmp" (and hence "sort") operators do not necessarily give the
>            correct results when both operands are UTF-EBCDIC encoded
> strings and
>            there is a mixture of ASCII and/or control characters, along with
>            other characters.

>
>        *  Ranges containing "\N{...}" in the "tr///" (and "y///")
>            transliteration operators are treated differently than the
> equivalent
>            ranges in regular expression patterns. They should, but don't,
> cause
>            the values in the ranges to all be treated as Unicode code
> points, and
>            not native ones. ("Version 8 Regular Expressions" in perlre gives
>            details as to how it should work.)

>
>        *  There are some bugs in the "pack"/"unpack" "U0" template

>
>        *  There are a significant number of test failures in the CPAN
> modules
>            shipped with Perl v5.22. These are only in modules not primarily
>            maintained by Perl 5 porters. Some of these are failures in
> the tests
>            only: they don't realize that it is proper to get different
> results on
>            EBCDIC platforms. And some of the failures are real bugs. If you
>            compile and do a "make test" on Perl, all tests on the "/cpan"
>            directory are skipped.

>
>            In particular, the extensions Unicode::Collate and
> Unicode::Normalize
>            are not supported under EBCDIC; likewise for the (now deprecated)
>            encoding pragma.

>
>            Encode partially works.

>
>
> >
>
>
>