Re: [pcre-dev] PCRE on EBCDIC tests

Top Page
Delete this message
Author: ph10
Date:  
To: Ze'ev Atlas
CC: pcre-dev@exim.org
Subject: Re: [pcre-dev] PCRE on EBCDIC tests
On Fri, 29 May 2015, Ze'ev Atlas wrote:

> I ran pcretest with -C option to get:
> PCRE version 8.37 2015-04-28
> Compiled with 

...
> LF is 0x25 


That should happen only if EBCDIC_NL25 is defined.

> Newline sequence is a non-standard value: 0x0015 


... which means is is not CR or LF or CRLF.

> Unicode has a NEL character 0x85, which I guess should be equivalent to
> EBCDIC's 0x25 when 0x15 is NL, as suggested in
> http://unicode.org/standard/reports/tr13/tr13-5.html


and in PCRE's README file.

> I am not sure why the user who contributed the original EBCDIC
> patch used the name CHAR_NL rather than CHAR_LF for the character
> represented by '\n', but I guess it was because it is usually the NL
> character. I think I will change that name, and introduce CHAR_NEL for
> the other character.


PCRE defines the names CHAR_NL, CHAR_NEL, CHAR_CR, and CHAR_LF. CHAR_NL
and CHAR_LF are synonyms for the same code point in both ASCII &
EBCDIC. See the pcre_internal.h file.

> I wrote a little program that reads my test file to a chunk of memory,
> prints it and dump the memory.  You will see below that the equivalent
> of \n is indeed 21=0x15.  My conclusion is that we need a way to tell
> PCRE that \n is not the LF character, but the NL character or
> otherwise, dictate what \n should be.


There are two issues here:

(1) What code point does \n represent?
(2) What code point (or pair of code points) represents newline?

In PCRE, the answer to (1) is the CHAR_LF character, which is either
0x15 or 0x25. The answer to (2) is determined by the value of the
NEWLINE macro. What is the setting of NEWLINE in your config.h file?

Philip

--
Philip Hazel