Re: [pcre-dev] issues with EBCDIC and pcretest

Páxina inicial
Borrar esta mensaxe
Autor: Ze'ev Atlas
Data:  
Para: Pcre Exim, Philip Hazel
Asunto: Re: [pcre-dev] issues with EBCDIC and pcretest
In my opinion, for the \c sequences, pcre_internal.h should have a section to explicitly define the below characters (taken from perlebcdic).  Something like:
#if defined NATIVE_ZOS  #if defined IBM1047#define CHAR_SOH   \x01..#define CHAR_DEL \x07.#elseif defined IBM037..#endif/* ASCII Definitions:*/#else.#define CAR_BEL \x07.#endif
And then, where you deal with \c
#if defined NATIVE_ZOS 
resultchar = deal_with_escape_c(input_char) /* do some search on an array */..#else... whatever you do today
It is possible that you have that in place, but I did not find such functionality and I volunteer to code that stuff if you guide me where it should be placed.

I am not sure what went wrong with the other escapes.



>From Perlebcdic:

- chr ord 8859-1 0037 1047 && POSIX-BC
- -----------------------------------------------------------------------
- \c@ 0 <NUL> <NUL> <NUL>
- \cA 1 <SOH> <SOH> <SOH>
- \cB 2 <STX> <STX> <STX>
- \cC 3 <ETX> <ETX> <ETX>
- \cD 4 <EOT> <ST> <ST>
- \cE 5 <ENQ> <HT> <HT>
- \cF 6 <ACK> <SSA> <SSA>
- \cG 7 <BEL> <DEL> <DEL>
- \cH 8 <BS> <EPA> <EPA>
- \cI 9 <HT> <RI> <RI>
- \cJ 10 <LF> <SS2> <SS2>
- \cK 11 <VT> <VT> <VT>
- \cL 12 <FF> <FF> <FF>
- \cM 13 <CR> <CR> <CR>
- \cN 14 <SO> <SO> <SO>
- \cO 15 <SI> <SI> <SI>
- \cP 16 <DLE> <DLE> <DLE>
- \cQ 17 <DC1> <DC1> <DC1>
- \cR 18 <DC2> <DC2> <DC2>
- \cS 19 <DC3> <DC3> <DC3>
- \cT 20 <DC4> <OSC> <OSC>
- \cU 21 <NAK> <NEL> <LF> **
- \cV 22 <SYN> <BS> <BS>
- \cW 23 <ETB> <ESA> <ESA>
- \cX 24 <CAN> <CAN> <CAN>
- \cY 25 <EOM> <EOM> <EOM>
- \cZ 26 <SUB> <PU2> <PU2>
- \c[ 27 <ESC> <SS3> <SS3>
- \c\X 28 <FS>X <FS>X <FS>X
- \c] 29 <GS> <GS> <GS>
- \c^ 30 <RS> <RS> <RS>
- \c_ 31 <US> <US> <US>
- \c? * <DEL> <APC> <APC>
 Ze'ev Atlas


      From: Ze'ev Atlas <zatlas1@???>
 To: Pcre Exim <pcre-dev@???>; Philip Hazel <ph10@???> 
 Sent: Tuesday, June 9, 2015 11:52 PM
 Subject: issues with EBCDIC and pcretest


Hi I am doing now testing for the EBCDIC package.  I ignored all obvious differences such as ¬ vs ^ or \x0a vs. \x15 (new line) and I am concentrating on the less obvious ones.  There are not too many of those (about 216 in 4 relevant files and I would probably dismiss many of them as irrelevant differences anyway.)
The issues that I see now may be marginal, but some of them are important.  I do not know whether the problem is in PCRE or in pcretest
Failure #1 (derived from testinput1)/abcd\t\n\r\f\a\e/    abcd\t\n\r\f\a\e9;\$\\?caxyzNo match
so I skipped the offending \e, just to find that I have a problem in both \a and in \e /abcd\t\n\r\f\a.\371/    abcd\t\n\r\f\a\e9;\$\\?caxyz 0: abcd\x05\x15\x0d\x0c\x07\x1b9
\a should not be translated into \x07 and \x07 should not be matched.  It is \x2f in EBCDIC\x1b is indeed the ESC in EBCDIC and it should have been recognized
.Failure #2 In testinput1, on lines 159 and 160 there is a character that I cannot really recognize on my ASCII editor and it does not translate well into EBCDIC.  Could you please let me know what is it.  Is it a UTF-8 character that made it here by mistake?
Failure #3Please consider:/¬\ca\cA\c[/
    \x01\x01\e;zNo match
/¬\ca\cA.;/    \x01\x01\e;z 0: \x01\x01\x1b;
Again \c[ should be ESC and thus should be \x1b according to perlebcdic document.  BTW, \c: is not defined in that document, so I ignored test that involve it.  Ideally, PCRE should have produced an error message that recognize this as an undefined escape sequence, but I understand why it does not produce such a message, for such marginal usage.
Failure #4Please consider:/¬[\w][\W][\s][\S][\d][\D][\b][\n][\c]][\022]/    a+ Z0+\xf8\n\x1d\x12No match
/¬[\w][\W][\s][\S][\d][\D]/    a+ Z0+\xf8\n\x1d\x12 0: a+ Z0+
/¬[\w][\W][\s][\S][\d][\D][\b]/    a+ Z0+\xf8\n\x1d\x12No match
Why didn't ]b match?
Failure #5This is a clear pcretest issue/¬[w-C_¬]+$/    wxy_¬ABC 0: wxy_¬ABC    *** FailersNo match    WXYNo match
It does not recognize the '*** Failers' string and tries to match it.  It is no more then some annoyance, but it may point to a real issue. 


 Ze'ev Atlas