Re: [pcre-dev] [PATCH] PCRE2 on Windows

Top Page
Delete this message
Author: ph10
Date:  
To: Daniel Richard G.
CC: pcre-dev
Subject: Re: [pcre-dev] [PATCH] PCRE2 on Windows
On Thu, 5 Apr 2018, Daniel Richard G. wrote:

> I have been building and testing PCRE2 on Windows, using somewhat older
> versions of Visual Studio (required by my employer for customer-system
> compatibility). I have found a few issues and have attached a patch
> (against r929) for a couple of them.
>
> In patch order:
>
> * RunTest.bat: This batch script passes an argument to -error in test 2
> that does not match that in the RunTest shell script, so test 2 was
> always failing on Windows


Oops. That is clearly an oversight. Patch applied.

> * pcre2_dfa_match.c: Several invocations of pcre2test were crashing due
> to _chkstk(). Meaning, the program ran out of stack space. I tracked
> the crash down to this file. To make a long story short, all those big
> local_offsets[] and local_workspace[] arrays are the source of the
> problem. Reducing their size from 1000 to 200 elements allows the test
> to run to completion.


I am not sure what do do about this. I don't want to make the reduction
in your patch because, sure as eggs are eggs, somebody's pattern match
will need the higher number. I don't really want to use the heap either,
as at the moment pcre2_dfa_match() doesn't use the heap at all. Having
said that, pcre2_dfa_match() is not the main matching function and (I
assume) much less used that pcre2_match(). More study of this issue
needed.

> * pcre2grep.c: Needs the same snprintf() workaround as seen elsewhere
> in the tree
>
> * Also added some logic so that non-C99 snprintf() works correctly
> (returns -1 in the case of overflow)


Both patches applied.

> With all that, there are a few remaining issues:
>
> * The code currently cannot be compiled without a stdint.h header, which
> is available only in relatively recent versions of Visual Studio.
> However, this portable and permissively-licensed implementation of the
> header worked without issue:
>
>     http://www.azillionmonkeys.com/qed/pstdint.h

>
> Just rename it and drop it into the top level of the build tree.


I have copied that paragraph into the NON-AUTOTOOLS-BUILD documentation
file.

> * Test 6 still crashes due to running out of stack space, only in this
> case, it's a very deep call stack that is the issue. I had to add
> /STACK:3000000 to the linker invocation for this issue to go away.


This is DFA matching again. On Linux it runs with a 2MB stack on my box.
The Linux default is 8MB and I continue to be amazed that in these days
of gigabyte memories, default stack sizes are so small. Not sure what to
do about this yet, either. As you might know, pcre2_match() was recently
refactored so as not to make much use of the stack. Perhaps something
drastic along the same lines is needed for pcre2_dfa_match(). Added to
the "think about" list.

> * Test 6 also fails because line 4932 of testinput6 contains a Ctrl-Z
> character, which on DOS/Windows indicates EOF.


That does look to be a very definite special binary character, as it's
the only one in that test. Unfortunately, I cannot remember why or when
that test was added, and I haven't been able to find anything in the
ChangeLog. Would new logic something like this work?

#ifdef WIN32
On reading a line with no newline ending with fgets(), which pcre2test
does now, try reading one more character with ... what? ... fgetc? ...
fread? and see if it's Ctrl-Z and if so add it and what follows to the
current line.
#endif

OH! I've just noticed this comment in the source of pcre2test:

/* A number of things vary for Windows builds. Originally, pcretest
opened its input and output without "b"; then I was told that "b" was
needed in some environments, so it was added for release 5.0 to both
the input and output. (It makes no difference on Unix-like systems.)
Later I was told that it is wrong for the input on Windows. I've now
abstracted the modes into macros that are set here, to make it easier
to fiddle with them, and removed "b" from the input mode under
Windows. The BINARY versions are used when saving/restoring compiled
patterns. */

Perhaps that "b" was right after all? There is no way I can test any of
this as I have no access to Windows systems.

> * The following JIT tests failed:


I'm hoping Zoltán can look at those sometime.

Philip

--
Philip Hazel