Re: [pcre-dev] [PATCH] PCRE2 on Windows

Top Page

Reply to this message
Author: Daniel Richard G.
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] [PATCH] PCRE2 on Windows
Hi Philip,

Apologies for the delay---I had a number of fires to put out this week.
Thank you for looking at this.

On Thu, 2018 Apr 19 18:18+0100, ph10@??? wrote:
>
> > * pcre2_dfa_match.c: Several invocations of pcre2test were crashing
> > due to _chkstk(). Meaning, the program ran out of stack space. I
> > tracked the crash down to this file. To make a long story short,
> > all those big local_offsets[] and local_workspace[] arrays are the
> > source of the problem. Reducing their size from 1000 to 200
> > elements allows the test to run to completion.
>
> I am not sure what do do about this. I don't want to make the
> reduction in your patch because, sure as eggs are eggs, somebody's
> pattern match will need the higher number. I don't really want to use
> the heap either, as at the moment pcre2_dfa_match() doesn't use the
> heap at all. Having said that, pcre2_dfa_match() is not the main
> matching function and (I assume) much less used that pcre2_match().
> More study of this issue needed.


(Will address this in my reply to your latest message)

> > * The code currently cannot be compiled without a stdint.h header,
> > which is available only in relatively recent versions of Visual
> > Studio. However, this portable and permissively-licensed
> > implementation of the header worked without issue:
> >
> >     http://www.azillionmonkeys.com/qed/pstdint.h

> >
> > Just rename it and drop it into the top level of the build tree.
>
> I have copied that paragraph into the NON-AUTOTOOLS-BUILD
> documentation file.


One thing I missed earlier is that the #include<stdint.h> is in the
pcre2.h header file, which means that stdint.h is needed not just to
compile PCRE2, but also any consuming application. This is a problem on
platforms that don't already provide stdint.h.

I am working on a new bit of logic that swaps in appropriate type/cpp
definitions as needed for platforms that don't have stdint.h. So far,
this is what I've found is needed:

    typedef unsigned char  PCRE2_UCHAR8;
    typedef unsigned short PCRE2_UCHAR16;
    typedef unsigned int   PCRE2_UCHAR32;


    typedef int int32_t;


    typedef PCRE2_UCHAR8  uint8_t;
    typedef PCRE2_UCHAR16 uint16_t;
    typedef PCRE2_UCHAR32 uint32_t;


    #define UINT16_MAX ((uint16_t)-1)
    #define UINT32_MAX ((uint32_t)-1)


    #define INT32_MIN -2147483648
    #define INT32_MAX  2147483647


This will need careful handling, however, given that those standard
definitions should not be publicly provided by PCRE2.

Work in progress.

> > * Test 6 still crashes due to running out of stack space, only in this
> > case, it's a very deep call stack that is the issue. I had to add
> > /STACK:3000000 to the linker invocation for this issue to go away.
>
> This is DFA matching again. On Linux it runs with a 2MB stack on my box.
> The Linux default is 8MB and I continue to be amazed that in these days
> of gigabyte memories, default stack sizes are so small. Not sure what to
> do about this yet, either. As you might know, pcre2_match() was recently
> refactored so as not to make much use of the stack. Perhaps something
> drastic along the same lines is needed for pcre2_dfa_match(). Added to
> the "think about" list.


(Will likewise address this in my next reply.)

> > * Test 6 also fails because line 4932 of testinput6 contains a Ctrl-
> > Z character, which on DOS/Windows indicates EOF.
>
> That does look to be a very definite special binary character, as it's
> the only one in that test. Unfortunately, I cannot remember why or
> when that test was added, and I haven't been able to find anything in
> the ChangeLog. Would new logic something like this work?
>
> #ifdef WIN32
> On reading a line with no newline ending with fgets(), which
> pcre2test does now, try reading one more character with ... what?
> ... fgetc? ... fread? and see if it's Ctrl-Z and if so add it and
> what follows to the current line.
> #endif


When Ctrl-Z is encountered in text mode, fgetc() returns EOF, and
fread() returns zero. I even tried fseek()ing past the Ctrl-Z, but
still get EOF.

> OH! I've just noticed this comment in the source of pcre2test:
>
> /* A number of things vary for Windows builds. Originally, pcretest
> opened its input and output without "b"; then I was told that "b"
> was needed in some environments, so it was added for release 5.0 to
> both the input and output. (It makes no difference on Unix-like
> systems.) Later I was told that it is wrong for the input on
> Windows. I've now abstracted the modes into macros that are set
> here, to make it easier to fiddle with them, and removed "b" from
> the input mode under Windows. The BINARY versions are used when
> saving/restoring compiled patterns. */
>
> Perhaps that "b" was right after all? There is no way I can test any
> of this as I have no access to Windows systems.


If I change INPUT_MODE to "rb", then the test passes. I'm not sure that
this is the correct solution (one ramification of this is that CRLF line
endings in the file will not be converted to plain LF endings in the
loaded strings), but hopefully that is a useful data point.

> > * The following JIT tests failed:
>
> I'm hoping Zoltán can look at those sometime.


I am now seeing the JIT tests pass as well. I think the failures may
have cascaded from the other issues I found rather than represent actual
JIT bugs needing his attention. Let's tentatively say for now that JIT
is in good shape on Windows.

One more point of interest: A comment in the source states that PCRE2
cannot be built with MSVC6, but in fact I am finding that it builds
and tests out with that ancient toolchain no worse than newer versions
of Visual Studio. It is certainly helpful to me that this
compatibility remains, and I'll be happy to formally verify it once
the other work is done.


--Daniel


--
Daniel Richard G. || skunk@???
My ASCII-art .sig got a bad case of Times New Roman.