Auteur: Ze'ev Atlas Datum: Aan: Pcre Exim, Philip Hazel Onderwerp: [pcre-dev] issues with EBCDIC and pcretest
Hi I am doing now testing for the EBCDIC package. I ignored all obvious differences such as ¬ vs ^ or \x0a vs. \x15 (new line) and I am concentrating on the less obvious ones. There are not too many of those (about 216 in 4 relevant files and I would probably dismiss many of them as irrelevant differences anyway.)
The issues that I see now may be marginal, but some of them are important. I do not know whether the problem is in PCRE or in pcretest
Failure #1 (derived from testinput1)/abcd\t\n\r\f\a\e/ abcd\t\n\r\f\a\e9;\$\\?caxyzNo match
so I skipped the offending \e, just to find that I have a problem in both \a and in \e /abcd\t\n\r\f\a.\371/ abcd\t\n\r\f\a\e9;\$\\?caxyz 0: abcd\x05\x15\x0d\x0c\x07\x1b9
\a should not be translated into \x07 and \x07 should not be matched. It is \x2f in EBCDIC\x1b is indeed the ESC in EBCDIC and it should have been recognized
.Failure #2 In testinput1, on lines 159 and 160 there is a character that I cannot really recognize on my ASCII editor and it does not translate well into EBCDIC. Could you please let me know what is it. Is it a UTF-8 character that made it here by mistake?
Failure #3Please consider:/¬\ca\cA\c[/
\x01\x01\e;zNo match
/¬\ca\cA.;/ \x01\x01\e;z 0: \x01\x01\x1b;
Again \c[ should be ESC and thus should be \x1b according to perlebcdic document. BTW, \c: is not defined in that document, so I ignored test that involve it. Ideally, PCRE should have produced an error message that recognize this as an undefined escape sequence, but I understand why it does not produce such a message, for such marginal usage.
Failure #4Please consider:/¬[\w][\W][\s][\S][\d][\D][\b][\n][\c]][\022]/ a+ Z0+\xf8\n\x1d\x12No match
/¬[\w][\W][\s][\S][\d][\D]/ a+ Z0+\xf8\n\x1d\x12 0: a+ Z0+
/¬[\w][\W][\s][\S][\d][\D][\b]/ a+ Z0+\xf8\n\x1d\x12No match
Why didn't ]b match?
Failure #5This is a clear pcretest issue/¬[w-C_¬]+$/ wxy_¬ABC 0: wxy_¬ABC *** FailersNo match WXYNo match
It does not recognize the '*** Failers' string and tries to match it. It is no more then some annoyance, but it may point to a real issue.