I ran pcretest with -C option to get:
PCRE version 8.37 2015-04-28Compiled with EBCDIC code support: LF is 0x25 8-bit support No UTF-8 support No Unicode properties support No just-in-time compiler support Newline sequence is a non-standard value: 0x0015 \R matches all Unicode newlines Internal link size = 2 POSIX malloc threshold = 10 Parentheses nest limit = 250 Default match limit = 10000000 Default recursion depth limit = 10000000 Match recursion uses stack
This reminded me an old conversation we've had sometime before<snip>I've been reading various web pages about EBCDIC systems, and they suggest that the NL EBCDIC character (0x15) is used as the equivalent ofASCII LF, though EBCDIC does have its own LF character (0x25) which was mentioned as sometimes used. Ze'ev's experience suggests that 0x15 isused in his environment.
Unicode has a NEL character 0x85, which I guess should be equivalent toEBCDIC's 0x25 when 0x15 is NL, as suggested in
http://unicode.org/standard/reports/tr13/tr13-5.html
I am not sure why the user who contributed the original EBCDIC patch used the name CHAR_NL rather than CHAR_LF for the character represented by '\n', but I guess it was because it is usually the NL character. I think I will change that name, and introduce CHAR_NEL for the other character.
Ze'ev: please can you check that '\n' really is 0x15 in your environment?<snip>
I wrote a little program that reads my test file to a chunk of memory, prints it and dump the memory. You will see below that the equivalent of \n is indeed 21=0x15. My conclusion is that we need a way to tell PCRE that \n is not the LF character, but the NL character or otherwise, dictate what \n should be.
/(?<=foo\n)¬bar/Im foo\x0Fbarbar ***Failers rhubarb barbell abc\nbarton/¬(?<=foo\n)bar/Im foo\x15barbar ***Failers rhubarb barbell abc\nbarton/(?<=foo\x0F)¬bar/Im foo\x0Fbarbar ***Failers rhubarb barbell abc\nbarton/¬(?<=foo\x15)bar/Im foo\x15barbar ***Failers rhubarb barbell abc\nbarton/(?>¬abc)/Im abc def\nabc *** Failers defabc/(?<=ab(c+)d)ef//(?<=ab(?<=c+)d)ef//(?<=ab(c|de)f)g/082FF5E0 | 61 4D 6F 4C 7E 86 96 96 E0 95 5D 5F 82 81 99 61 | /(?<=foo\n)¬bar/082FF5F0 | C9 94 15 40 40 40 40 86 96 96 E0 A7 F0 C6 82 81 | Im foo\x0Fba082FF600 | 99 82 81 99 40 15 40 40 40 40 5C 5C 5C C6 81 89 | rbar ***Fai082FF610 | 93 85 99 A2 15 40 40 40 40 99 88 A4 82 81 99 82 | lers rhubarb082FF620 | 15 40 40 40 40 82 81 99 82 85 93 93 15 40 40 40 | barbell082FF630 | 40 81 82 83 E0 95 82 81 99 A3 96 95 15 15 61 5F | abc\nbarton /¬082FF640 | 4D 6F 4C 7E 86 96 96 E0 95 5D 82 81 99 61 C9 94 | (?<=foo\n)bar/Im082FF650 | 15 40 40 40 40 86 96 96 E0 A7 F1 F5 82 81 99 82 | foo\x15barb082FF660 | 81 99 40 15 40 40 40 40 5C 5C 5C C6 81 89 93 85 | ar ***Faile082FF670 | 99 A2 15 40 40 40 40 99 88 A4 82 81 99 82 15 40 | rs rhubarb082FF680 | 40 40 40 82 81 99 82 85 93 93 15 40 40 40 40 81 | barbell a082FF690 | 82 83 E0 95 82 81 99 A3 96 95 15 15 61 4D 6F 4C | bc\nbarton /(?<082FF6A0 | 7E 86 96 96 E0 A7 F0 C6 5D 5F 82 81 99 61 C9 94 | =foo\x0F)¬bar/Im082FF6B0 | 15 40 40 40 40 86 96 96 E0 A7 F0 C6 82 81 99 82 | foo\x0Fbarb082FF6C0 | 81 99 40 15 40 40 40 40 5C 5C 5C C6 81 89 93 85 | ar ***Faile082FF6D0 | 99 A2 15 40 40 40 40 99 88 A4 82 81 99 82 15 40 | rs rhubarb082FF6E0 | 40 40 40 82 81 99 82 85 93 93 15 40 40 40 40 81 | barbell a082FF6F0 | 82 83 E0 95 82 81 99 A3 96 95 15 15 61 5F 4D 6F | bc\nbarton /¬(?082FF700 | 4C 7E 86 96 96 E0 A7 F1 F5 5D 82 81 99 61 C9 94 | <=foo\x15)bar/Im082FF710 | 15 40 40 40 40 86 96 96 E0 A7 F1 F5 82 81 99 82 | foo\x15barb082FF720 | 81 99 40 15 40 40 40 40 5C 5C 5C C6 81 89 93 85 | ar ***Faile082FF730 | 99 A2 15 40 40 40 40 99 88 A4 82 81 99 82 15 40 | rs rhubarb082FF740 | 40 40 40 82 81 99 82 85 93 93 15 40 40 40 40 81 | barbell a082FF750 | 82 83 E0 95 82 81 99 A3 96 95 15 15 61 4D 6F 6E | bc\nbarton /(?>082FF760 | 5F 81 82 83 5D 61 C9 94 15 40 40 40 40 81 82 83 | ¬abc)/Im abc082FF770 | 15 40 40 40 40 84 85 86 E0 95 81 82 83 15 40 40 | def\nabc082FF780 | 40 40 5C 5C 5C 40 C6 81 89 93 85 99 A2 15 40 40 | *** Failers082FF790 | 40 40 84 85 86 81 82 83 15 15 61 4D 6F 4C 7E 81 | defabc /(?<=a082FF7A0 | 82 4D 83 4E 5D 84 5D 85 86 61 15 15 61 4D 6F 4C | b(c+)d)ef/ /(?<082FF7B0 | 7E 81 82 4D 6F 4C 7E 83 4E 5D 84 5D 85 86 61 15 | =ab(?<=c+)d)ef/082FF7C0 | 15 61 4D 6F 4C 7E 81 82 4D 83 4F 84 85 5D 86 5D | /(?<=ab(c|de)f)082FF7D0 | 87 61 15 __ __ __ __ __ __ __ __ __ __ __ __ __ | g/ Ze'ev Atlas
From: Ze'ev Atlas <zatlas1@???>
To: "pcre-dev@???" <pcre-dev@???>
Sent: Thursday, May 28, 2015 3:31 PM
Subject: Re: PCRE on EBCDIC tests
\n is definitly not defined correctly
I ran 4 tests. Note that new line is defined correctly as 21 = 0x15 in EBCDIC
/(?<=foo\n)¬bar/ImCapturing subpattern count = 0Max lookbehind = 4Contains explicit CR or LF matchOptions: multilineNo first charNeed char = 'r' foo\x0FbarbarNo match ***FailersNo match rhubarbNo match barbellNo match abc\nbartonNo match
/¬(?<=foo\n)bar/ImCapturing subpattern count = 0Max lookbehind = 4Contains explicit CR or LF matchOptions: multilineFirst char at start or follows newlineNeed char = 'r' foo\x15barbarNo match ***FailersNo match rhubarbNo match barbellNo match abc\nbartonNo match /(?<=foo\x0F)¬bar/ImCapturing subpattern count = 0Max lookbehind = 4Options: multilineNo first charNeed char = 'r' foo\x0FbarbarNo match ***FailersNo match rhubarbNo match barbellNo match abc\nbartonNo match
/¬(?<=foo\x15)bar/ImCapturing subpattern count = 0Max lookbehind = 4Options: multilineFirst char at start or follows newlineNeed char = 'r' foo\x15barbar 0: bar ***FailersNo match rhubarbNo match barbellNo match abc\nbartonNo match
In my config.h I have#ifndef NEWLINE#define NEWLINE 21#endif