[pcre-dev] [Bug 2315] PCRE2_NEWLINE_ANYCRLF appears to be n…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2315] PCRE2_NEWLINE_ANYCRLF appears to be nonfunctional
https://bugs.exim.org/show_bug.cgi?id=2315

--- Comment #8 from Rich Siegel <siegel@???> ---
Right you are - after building a clean set of sources and re-testing (I had to
set PCRE_SUPPORT_UNICODE_PROPERTIES = 1 first), my PCRE1 test case started to
fail.

I believe I have figured out what's going on: I diffed the CMakeCache.txt from
my svn working copy, and from the clean sources. What I found was this:

- in the "clean" PCRE1 build, CMAKE_CFLAGS was empty:

     //Flags used by the C compiler during all build types.
     CMAKE_C_FLAGS:STRING=


- in my svn working copy of PCRE1, which was able to match LF line breaks when
"\r" occurred in a pattern, I had:

     //Flags used by the compiler during all build types.
     CMAKE_C_FLAGS:STRING=-mno-mmx -mno-sse -DLINK_SIZE=4 -DESC_r=CHAR_LF


Clearly, the "ESC_r=CHAR_LF" is the salient difference, and it explains not
only (a) the behavior difference between my working copy and the "clean" PCRE1
build; but also (2) why "\r" in a pattern will match an LF that occurs in the
subject in the PCRE1 that I've been using.

Now, here's where the plot thickens.

I was staring at this configuration, and wondering why it didn't work in PCRE2.
So I did some searching of the sources, and determined that PCRE2 doesn't use
"ESC_r" anymore.

So it didn't matter at all that I was setting it in CCFLAGS. I suspect that
this feature was lost in the new work that was done for PCRE2.

I'd be grateful if there could be some supported way to determine at compile
time whether "\r" maps to CHAR_CR (factory default to preserve current
behavior), CHAR_LF ("\r" matches LF, my current use case), or something else.

Meanwhile, I've made two changes:

- in pcre2_internal.h, I brought over a conditional definition of "ESC_r", so
that if it's not explicitly set in CCFLAGS, it'll default to CHAR_CR.

- in pcre2_compile.c, I changed "CHAR_CR" in the escape tables to use "ESC_r".

Here are the patches.

===================================================================
--- pcre2_compile.c    (revision 1003)
+++ pcre2_compile.c    (working copy)
@@ -521,7 +521,7 @@
      0,                       0,
      CHAR_LF,                 0,
      -ESC_p,                  0,
-     CHAR_CR,                 -ESC_s,
+     ESC_r,                  -ESC_s,
      CHAR_HT,                 0,
      -ESC_v,                  -ESC_w,
      0,                       0,
@@ -549,7 +549,7 @@
 /*  80 */         CHAR_BEL, -ESC_b,       0, -ESC_d, CHAR_ESC, CHAR_FF,     
0,
 /*  88 */ -ESC_h,        0,      0,     '{',      0,        0,       0,     
0,
 /*  90 */      0,        0, -ESC_k,       0,      0,  CHAR_LF,       0,
-ESC_p,
-/*  98 */      0,  CHAR_CR,      0,     '}',      0,        0,       0,     
0,
+/*  98 */      0,    ESC_r,      0,     '}',      0,        0,       0,     
0,
 /*  A0 */      0,      '~', -ESC_s, CHAR_HT,      0,   -ESC_v,  -ESC_w,     
0,
 /*  A8 */      0,   -ESC_z,      0,       0,      0,      '[',       0,     
0,
 /*  B0 */      0,        0,      0,       0,      0,        0,       0,     
0,


===================================================================
--- pcre2_internal.h    (revision 1003)
+++ pcre2_internal.h    (working copy)
@@ -369,6 +369,10 @@
 Any changes should ensure that the various macros are kept in step with each
 other. NOTE: The values also appear in pcre2_jit_compile.c. */


+#ifndef ESC_r
+#define ESC_r CHAR_CR
+#endif
+
/* -------------- ASCII/Unicode environments -------------- */

#ifndef EBCDIC

--
You are receiving this mail because:
You are on the CC list for the bug.