Autor: Ze'ev Atlas Data: A: pcre-dev@exim.org Assumpte: Re: [pcre-dev] issues with EBCDIC and pcretest
#1>0xa0 is indeed a non-breaking space in 8859 and Unicode. I don't know >what the EBCDIC equivalent is ... a quick Google suggests that it might
>be 0x41: I tried andTESTOUT1:----------------/\H*\h+\V?\v{3,4}/ \x09\x20\xa0X\x0a\x0b\x0c\x0d\x0a 0: \x09 \xa0X\x0a\x0b\x0c\x0d \x09\x20\xa0\x0a\x0b\x0c\x0d\x0a 0: \x09 \xa0\x0a\x0b\x0c\x0d \x09\x20\xa0\x0a\x0b\x0c 0: \x09 \xa0\x0a\x0b\x0c ** Failers No match \x09\x20\xa0\x0a\x0bNo match
My tests:------------/\H*\h+\V?\v{3,4}/ \x05\x40\x41X\x15\x0b\x0c\x0d\x15 No match \x05\x40\x41\x15\x0b\x0c\x0d\x15 0: \x05 \x15\x0b\x0c\x0d \x05\x40\x41\x15\x0b\x0c 0: \x05 \x15\x0b\x0c ** Failers No match \x05\x40\x41\x15\x0b No match
0x41 is not recognized as any of \h, \t, \v, (not even as \s, but that is consistent with ASCII)
0x25 is not recognized as anything as well (but it is recognized as part of <any>, <bsr_unicode>)
I cannot reproduce accurately all the tests that involve 0x85 nor those that involve 0xa0. Unless we think that something could and should be done about those, I would close my tests for testinput1 and testinpu2. What do you think?
#2
An interesting point: The Perlre in perldocs (5.20), document states: (The following all specify the same class of three characters: [-az] , [az-] , and [a\-z] . All are different from [a-z] , which specifies a class containing twenty-six characters, even on EBCDIC-based character sets.)
Apparently, Perl somehow recognizes [a-z] and treats it as a special case in EBCDIC and ignore the non-letters gaps. This is news to me. Dis you know that? I intend to ask in the perl-mvs forum what do they do about it.
Ze'ev Atlas