Autor: Ze'ev Atlas Datum: To: pcre-dev@exim.org Betreff: Re: [pcre-dev] Porting PCRE to z/OS - current status 07/29/2012
Hi
Thanks Phillip
>I see from the pcretest.c file that
>an output file is opened with mode "wb", i.e. "binary". Maybe if you
>change this to just "w" it might behave differently. The output mode is
>held in a macro called OUTPUT_MODE, defined separately for Windows and
>non-Windows (though the same value is used for both for OUTPUT_MODE).
>There is also INPUT_MODE, which is set non-binary for Windows and binary
>for everything else.
I changed both INPUT_MODE and OUTPUT_MODE under NATIVE_ZOS option (option in the z/OS config.h; macros in pcretest.c [or TESTD in my library]) to be simple 'r' and 'w' correspondingly. Now I get ["incorrect"] results printed correctly.
Here are somepreliminary results for a small part of input1.txt that demonstrate what I would have to deal with.
As I'd suspected, the /i modifier does not work correctly under EBCDIC. I will have to investigate that in the way EBCDIC tables are set and in the way the logic expects to identify the upper case-lower case pairs (I will look whether the logic knows to work with non-contiguous encoding and whether it knows that C1 (i.e 'A') corresponds correctly with x'81' (i.e. 'a') and so on.
The pattern /abcd\t\n\r\f\a\e\071\x3b\$\\\?caxyz/ would correctly NOT MATCH with abcd\t\n\r\f\a\e9;\$\\?caxyz because the octal 071 IS NOT the numeral 9 in EBCDIC (x'F0' is,) so I have to say that 'pcre' bits us to that even though the result is obviously different then the one in the ASCII world
As I have noted before, we have to use the character '¬' (x'5F') instead of the '^' (x'B0')
The pattern /¬(b+|a){1,2}?bc/ would not match bbc... one of many things I'll have to investigate
Similarly, the pattern /¬(b*|ba){1,2}?bc/ would not match bbabc although it matches:
0: babc
1: ba
like the ASCII version. There are similar results for similar pattern later.
Pattern /¬\ca\cA\c[\c{\c:/ does not match \x01\x01\e;z - This is somewhat surprising because the shortcuts should be correct... I'll have to review the EBCDIC tables and what exactly \c[, \c{ and \c: mean in EBCDIC and how they are understood by pcre. I suspect that they conform with the old 037 rather then 1047, but even that is not a good enough explanation.
Besides resolving these issues and others that will come to light, I will have to develop EBCDIC specific tests.
BTW x'FF' is EO in EBCDIC and thus correctly abruptly stops the FTP