Re: [pcre-dev] issues with EBCDIC and pcretest

Góra strony
Delete this message
Autor: ph10
Data:  
Dla: Ze'ev Atlas
CC: pcre-dev@exim.org
Temat: Re: [pcre-dev] issues with EBCDIC and pcretest
On Thu, 18 Jun 2015, Ze'ev Atlas wrote:

I have added \x41 to the list that is recognized by \h and committed the
patch.

> An interesting point:  The Perlre in perldocs (5.20), document states: (The following all specify the same class of three characters: [-az] , [az-] , and [a\-z] . All are different from [a-z] , which specifies a class containing twenty-six characters, even on EBCDIC-based character sets.) 
>
> Apparently, Perl somehow recognizes [a-z] and treats it as a special case in EBCDIC and ignore the non-letters gaps.  This is news to me.  Dis you know that?  I intend to ask in the perl-mvs forum what do they do about it.


I did not know that. PCRE does not treat [a-z] as special.

> Obviously, I know that \p and \P are useless, but the tests are odd, and I am trying to reduce the level of oddity as much as I could.                                                                           


There was a bug. It was not diagnosing an error for \p and \P within a
class when UCP support was disabled. I have fixed that.

> While 0x41 is indeed not in any class that I may have thought about,
> 0x25, is actually in some.


> /[\h]/BZ                                                          

------------------------------------------------------------------
        Bra                                                       
        [\x05\x0b-\x0d\x15\x25 ]                                  
        Ket                                                       
        End                                                      
------------------------------------------------------------------

That is wrong! It should only be \x05, space, and (now) \x41. Those
vertical spaces should not be there. Can you check again, please?

> /[\v]/BZ                                                        

  ------------------------------------------------------------------
    Bra                                                        
    [\x0b-\x0d\x15\x25]                                         
    Ket                                                           
    End                                                      
------------------------------------------------------------------ 

That one is correct.


> /\R/SI                                    

Starting chars: \x0b \x0c \x0d \x15 \x25  

That is correct.

Philip

--
Philip Hazel