Re: [pcre-dev] [PATCH] PCRE2 on Windows

Top Page

Reply to this message
Author: Daniel Richard G.
Date:  
To: ph10, pcre-dev
Subject: Re: [pcre-dev] [PATCH] PCRE2 on Windows
Hi Philip,

Apologies for the delay; many fires to put out last week.

On Sun, 2018 Apr 29 16:20+0100, ph10@??? wrote:
> On Sat, 28 Apr 2018, I wrote:
>
> > So the problem is why is the match limit not biting in your
> > environment? Did you change the default when you build PCRE2?
>
> I now think you must have done so, because when I build with --with-match-
> limit=4000000000 I get the same errors as you did.


About the only significant configuration tweak I'm making is
--with-link-size=4, needed to handle some gargantuan regexes.

> I have now patched the JIT test program (pcre2_jit_test) to
> ensure that it sets a match limit no greater than the 10 000 000
> default (r934).
>
> I think all the issues you raised are now dealt with. Thanks for
> bringing them up.


Everything is now testing out cleanly for me on Windows, once pcre2test
is built with /STACK:3000000. I'll see if I can get that in there so
that this does not have to be put in manually.

For now, I have a few further changes to contribute. The attached patch
is against r934:

* In both the Autoconf and CMake configuration, set variables to 0 or 1
indicating the presence or absence of both inttypes.h and stdint.h.

* These variables are then baked into the generated pcre.h, so that it
#includes one or none of these headers depending on whether the system
has them or not.

(Note that inttypes.h handily provides the integer-related definitions
needed to build PCRE2 on older Unix systems that do not have
stdint.h.)

If the system has neither header (as on Windows and even older Unix
systems), then for now, the consuming application would be responsible
for defining the integer-related symbols needed by the PCRE2 API.

* RunGrepTest: Changed all the double-quoted printf(1) strings to
single-quoted, to avoid any potential escaping issues.

* Older versions of printf(1) do not interpret "\x00", but can handle
"\0" or "\000" just fine.

* Older C compilers choke on C++ comments.

* Some older C compilers, like MSVC, do not support "%td". I added
a check that uses "%lu" for any non-C99 build environment, as a
catch-all.

* Similarly, older C compilers do not support "%zu". I changed this to
"%lu" and a cast, as I presume pcre2test is not likely to ever handle
a >4 GB pattern.


With this, there are still a few issues remaining:

* The /STACK:3000000 thing on Windows noted earlier.

* Possibly some way of defining the integer types needed for systems
that lack both inttypes.h and stdint.h, although I'm not sure how to
safely define e.g. uint32_t for the benefit of prototypes in pcre2.h
since the consuming application will necessarily see those
definitions too.

* Currently, the generic pcre2.h header (under src/) in the source tree
is used instead of the generated pcre2.h in the build tree. This is
due to the use of #include"pcre2.h" instead of #include<pcre2.h>.
Would it be all right to change all instances of the former to the
latter in the PCRE2 source?

* I am seeing some errors from the French-locale test on an older
Solaris system, which has fr_CA. I'm not quite sure how to diagnose
the problem. Would you be able to make heads or tails of it if I
posted the output? (It's conceivable that this system is using a
different encoding for this locale, although unfortunately the system
iconv facilities have not been helpful.)


--Daniel


--
Daniel Richard G. || skunk@???
My ASCII-art .sig got a bad case of Times New Roman.
Index: CMakeLists.txt
===================================================================
--- CMakeLists.txt    (revision 934)
+++ CMakeLists.txt    (working copy)
@@ -113,6 +113,15 @@
 CHECK_INCLUDE_FILE(unistd.h     HAVE_UNISTD_H)
 CHECK_INCLUDE_FILE(windows.h    HAVE_WINDOWS_H)


+SET(PCRE2_HAVE_INTTYPES_H 0)
+SET(PCRE2_HAVE_STDINT_H 0)
+IF(HAVE_INTTYPES_H)
+  SET(PCRE2_HAVE_INTTYPES_H 1)
+ENDIF(HAVE_INTTYPES_H)
+IF(HAVE_STDINT_H)
+  SET(PCRE2_HAVE_STDINT_H 1)
+ENDIF(HAVE_STDINT_H)
+
 CHECK_FUNCTION_EXISTS(bcopy     HAVE_BCOPY)
 CHECK_FUNCTION_EXISTS(memmove   HAVE_MEMMOVE)
 CHECK_FUNCTION_EXISTS(strerror  HAVE_STRERROR)
Index: RunGrepTest
===================================================================
--- RunGrepTest    (revision 934)
+++ RunGrepTest    (working copy)
@@ -600,7 +600,7 @@
 echo "RC=$?" >>testtrygrep


echo "---------------------------- Test 119 -----------------------------" >>testtrygrep
-printf "123\n456\n789\n---abc\ndef\nxyz\n---\n" >testNinputgrep
+printf '123\n456\n789\n---abc\ndef\nxyz\n---\n' >testNinputgrep
$valgrind $vjs $pcre2grep -Mo '(\n|[^-])*---' testNinputgrep >>testtrygrep
echo "RC=$?" >>testtrygrep

@@ -631,7 +631,7 @@
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 125 -----------------------------" >>testtrygrep
-printf "abcd\n" >testNinputgrep
+printf 'abcd\n' >testNinputgrep
$valgrind $vjs $pcre2grep --colour=always '(?<=\K.)' testNinputgrep >>testtrygrep
echo "RC=$?" >>testtrygrep
$valgrind $vjs $pcre2grep --colour=always '(?=.\K)' testNinputgrep >>testtrygrep
@@ -642,8 +642,8 @@
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 126 -----------------------------" >>testtrygrep
-printf "Next line pattern has binary zero\nABC\x00XYZ\n" >testtemp1grep
-printf "ABC\x00XYZ\nABCDEF\nDEFABC\n" >testtemp2grep
+printf 'Next line pattern has binary zero\nABC\0XYZ\n' >testtemp1grep
+printf 'ABC\0XYZ\nABCDEF\nDEFABC\n' >testtemp2grep
$valgrind $vjs $pcre2grep -a -f testtemp1grep testtemp2grep >>testtrygrep
echo "RC=$?" >>testtrygrep

@@ -687,25 +687,25 @@
# starts with a hyphen. These tests are run in the build directory.

echo "Testing pcre2grep newline settings"
-printf "abc\rdef\r\nghi\njkl" >testNinputgrep
+printf 'abc\rdef\r\nghi\njkl' >testNinputgrep

-printf "%c--------------------------- Test N1 ------------------------------\r\n" - >testtrygrep
+printf '%c--------------------------- Test N1 ------------------------------\r\n' - >testtrygrep
$valgrind $vjs $pcre2grep -n -N CR "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep

-printf "%c--------------------------- Test N2 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N2 ------------------------------\r\n' - >>testtrygrep
$valgrind $vjs $pcre2grep -n --newline=crlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep

-printf "%c--------------------------- Test N3 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N3 ------------------------------\r\n' - >>testtrygrep
pattern=`printf 'def\rjkl'`
$valgrind $vjs $pcre2grep -n --newline=cr -F "$pattern" testNinputgrep >>testtrygrep

-printf "%c--------------------------- Test N4 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N4 ------------------------------\r\n' - >>testtrygrep
$valgrind $vjs $pcre2grep -n --newline=crlf -F -f $srcdir/testdata/greppatN4 testNinputgrep >>testtrygrep

-printf "%c--------------------------- Test N5 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N5 ------------------------------\r\n' - >>testtrygrep
$valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep

-printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N6 ------------------------------\r\n' - >>testtrygrep
$valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep

# It seems inpossible to handle NUL characters easily in Solaris (aka SunOS).
@@ -713,10 +713,10 @@
# don't run this test under SunOS. Fudge the output so that the comparison
# works. A similar problem has also been reported for MacOS (Darwin).

-printf "%c--------------------------- Test N7 ------------------------------\r\n" - >>testtrygrep
+printf '%c--------------------------- Test N7 ------------------------------\r\n' - >>testtrygrep
 uname=`uname`
 if [ "$uname" != "SunOS" -a "$uname" != "Darwin" ] ; then
-  printf "abc\0def" >testNinputgrep
+  printf 'abc\0def' >testNinputgrep
   $valgrind $vjs $pcre2grep -na --newline=nul "^(abc|def)" testNinputgrep | sed 's/\x00/ZERO/' >>testtrygrep
   echo "" >>testtrygrep
 else
Index: configure.ac
===================================================================
--- configure.ac    (revision 934)
+++ configure.ac    (working copy)
@@ -451,6 +451,11 @@
 AC_CHECK_HEADERS([windows.h], [HAVE_WINDOWS_H=1])
 AC_CHECK_HEADERS([sys/wait.h], [HAVE_SYS_WAIT_H=1])


+PCRE2_HAVE_INTTYPES_H=`test "x$ac_cv_header_inttypes_h" = "xyes" && echo 1 || echo 0`
+PCRE2_HAVE_STDINT_H=`test "x$ac_cv_header_stdint_h" = "xyes" && echo 1 || echo 0`
+AC_SUBST([PCRE2_HAVE_INTTYPES_H])
+AC_SUBST([PCRE2_HAVE_STDINT_H])
+
 # Conditional compilation
 AM_CONDITIONAL(WITH_PCRE2_8, test "x$enable_pcre2_8" = "xyes")
 AM_CONDITIONAL(WITH_PCRE2_16, test "x$enable_pcre2_16" = "xyes")
Index: src/pcre2.h.in
===================================================================
--- src/pcre2.h.in    (revision 934)
+++ src/pcre2.h.in    (working copy)
@@ -81,12 +81,27 @@
 #define PCRE2_CALL_CONVENTION
 #endif


+/* Does this system have inttypes.h and/or stdint.h? */
+
+#if @PCRE2_HAVE_INTTYPES_H@
+#define PCRE2_HAVE_INTTYPES_H
+#endif
+
+#if @PCRE2_HAVE_STDINT_H@
+#define PCRE2_HAVE_STDINT_H
+#endif
+
/* Have to include limits.h, stdlib.h and stdint.h to ensure that size_t and
uint8_t, UCHAR_MAX, etc are defined. */

#include <limits.h>
#include <stdlib.h>
+
+#if defined(PCRE2_HAVE_STDINT_H)
#include <stdint.h>
+#elif defined(PCRE2_HAVE_INTTYPES_H)
+#include <inttypes.h>
+#endif

/* Allow for C++ users compiling this directly. */

Index: src/pcre2_compile.c
===================================================================
--- src/pcre2_compile.c    (revision 934)
+++ src/pcre2_compile.c    (working copy)
@@ -63,8 +63,8 @@


/* Other debugging code can be enabled by these defines. */

-// #define DEBUG_SHOW_CAPTURES
-// #define DEBUG_SHOW_PARSED
+/* #define DEBUG_SHOW_CAPTURES */
+/* #define DEBUG_SHOW_PARSED */

 /* There are a few things that vary with different code unit sizes. Handle them
 by defining macros in order to minimize #if usage. */
Index: src/pcre2_match.c
===================================================================
--- src/pcre2_match.c    (revision 934)
+++ src/pcre2_match.c    (working copy)
@@ -45,9 +45,9 @@


/* These defines enables debugging code */

-//#define DEBUG_FRAMES_DISPLAY
-//#define DEBUG_SHOW_OPS
-//#define DEBUG_SHOW_RMATCH
+/* #define DEBUG_FRAMES_DISPLAY */
+/* #define DEBUG_SHOW_OPS */
+/* #define DEBUG_SHOW_RMATCH */

 #ifdef DEBUG_FRAME_DISPLAY
 #include <stdarg.h>
Index: src/pcre2test.c
===================================================================
--- src/pcre2test.c    (revision 934)
+++ src/pcre2test.c    (working copy)
@@ -80,7 +80,7 @@


/* Debugging code enabler */

-// #define DEBUG_SHOW_MALLOC_ADDRESSES
+/* #define DEBUG_SHOW_MALLOC_ADDRESSES */

/* Both libreadline and libedit are optionally supported. The user-supplied
original patch uses readline/readline.h for libedit, but in at least one system
@@ -163,7 +163,7 @@
#endif

/* VC doesn't support "%td". */
-#ifdef _MSC_VER
+#if defined(_MSC_VER) || __STDC_VERSION__ < 199901L
#define PTR_SPEC "%lu"
#else
#define PTR_SPEC "%td"
@@ -5470,8 +5470,8 @@

   if (rc != 0)
     {
-    fprintf(outfile, "** Pattern conversion error at offset %zu: ",
-      converted_length);
+    fprintf(outfile, "** Pattern conversion error at offset %lu: ",
+      (unsigned long)converted_length);
     convert_return = print_error_message(rc, "", "\n")? PR_SKIP:PR_ABEND;
     }