Re: [pcre-dev] [PATCH] PCRE2 on Windows

Top Page
Delete this message
Author: ph10
Date:  
To: Daniel Richard G.
CC: pcre-dev
Subject: Re: [pcre-dev] [PATCH] PCRE2 on Windows
Hi Daniel,

Thanks for your messages and your efforts. I haven't yet looked
seriously at your new patches. However, I should say that for many years
I stuck religiously to C90 in all PCRE code, until PCRE2 came along. By
that stage I felt it was probably time to allow some newer C things
(e.g. snprintf()) to be used, so nowadays I work from C11. One can keep
on supporting historic facilities for only so long... so I'm not sure
exactly how many of your patches I will want to include when I do get
round to looking at them.

The C++ comment comment amused me because // comments were a feature of
BCPL way back in the late 1960s and it annoyed me when they were not
carried forward into C. However, I don't tend to use them very much.

> * Possibly some way of defining the integer types needed for systems
> that lack both inttypes.h and stdint.h, although I'm not sure how to
> safely define e.g. uint32_t for the benefit of prototypes in pcre2.h
> since the consuming application will necessarily see those
> definitions too.


Do you think this is a big issue? How many such systems are there still
around that are likely to want to install PCRE2?

> * Currently, the generic pcre2.h header (under src/) in the source tree
> is used instead of the generated pcre2.h in the build tree. This is
> due to the use of #include"pcre2.h" instead of #include<pcre2.h>.
> Would it be all right to change all instances of the former to the
> latter in the PCRE2 source?


... but would this not have the effect of including an *installed*
version of pcre2.h from /usr/include instead of from the build or source
trees? Oh dear. I've just looked at the C11 standard and it seems to say
that both <> and "" forms search "in an implementation defined manner"
(and indeed this is unchanged from C90). And what is more, the gcc
documentation says this:

#include <file>

    This variant is used for system header files. It searches for a file
    named file in a standard list of system directories. You can prepend
    directories to this list with the -I option (see Invocation).


#include "file"

    This variant is used for header files of your own program. It
    searches for a file named file first in the directory containing the
    current file, then in the quote directories and then the same
    directories used for <file>. You can prepend directories to the list
    of quote directories with the -iquote option.


It seems to me that the "" form is right for "header files of your own
program". Perhaps the way out of this is to ensure that there isn't a
pcre2.h file in the source tree? Hmm. I've just looked at a distribution
tarball, and there is no pcre2.h therein, only pcre2.h.generic and
pcre2.h.in. How did one get into your source tree? Perhaps you did a
previous build within that tree?

The French test: you can try posting me the output, but there do seem to
be variations in locale handling - which of course should be being
phased out in favour of Unicode. :-)

> However, I am still seeing the deep call stack that requires the
> /STACK:3000000 linker flag :(
>
> Specifically, the error is "Unhandled exception at 0xDEADBEEF in
> pcre2test.exe: 0xDEADBEEF: Stack overflow."
>
> The call stack at the first failure is as follows:
>
>     add_to_class_internal()
>     add_to_class()
>     compile_branch()
>     compile_regex()
>     compile_branch()
>     compile_regex()
>     [repeat the last two many many times]


Aha. This is a failure in compiling, so neither of the matching
interpreters is involved.
>
> It appears to be test 8 which is failing in this way.


Test 8 contains a test for deeply nested parentheses, preceded by this
comment:

# Check the absolute limit on nesting (?| etc. This varies with code unit
# width because the workspace is a different number of bytes. It will fail
# with link size 2 in 8-bit and 16-bit but not in 32-bit.

The test itself sets parens_nest_limit=1000 and does have around 1000
nested parens. Please can you try cutting out this test from the input
and see if all is well? Cutting it down to a smaller number might not be
straightforward. I wonder why nobody has complained about this issue
before?

> > That leaves only the Ctrl/Z issue. Since I can't remember why it's
> > there, I think I will just change it to some other character.
>
> If changing the input read to binary mode is deemed unacceptable, then
> this should address the problem as well.


I decided that going back to binary mode was unacceptable on the grounds
that, if Ctrl/Z is normally treated as EOF in Windows text files, it
should be treated that way by pcre2test.

> One thing I missed earlier is that the #include<stdint.h> is in the
> pcre2.h header file, which means that stdint.h is needed not just to
> compile PCRE2, but also any consuming application. This is a problem on
> platforms that don't already provide stdint.h.


How many don't? It's been around in the standard for over a decade.
I must say I'm a bit reticent about pandering too much to non-standard
environments.

> This will need careful handling, however, given that those standard
> definitions should not be publicly provided by PCRE2.


Absolutely!

> I am now seeing the JIT tests pass as well. I think the failures may
> have cascaded from the other issues I found rather than represent actual
> JIT bugs needing his attention. Let's tentatively say for now that JIT
> is in good shape on Windows.


Yes, I agree. I think some of the JIT tests were hitting some of the
stack issues.

> One more point of interest: A comment in the source states that PCRE2
> cannot be built with MSVC6, but in fact I am finding that it builds
> and tests out with that ancient toolchain no worse than newer versions
> of Visual Studio. It is certainly helpful to me that this
> compatibility remains, and I'll be happy to formally verify it once
> the other work is done.


A quick google suggests me me that MSVC6 is now 20 years old. I'm amazed
you still have a need for it when compiling new software, but hey, what
do I know? And if it works, it works...

Philip

--
Philip Hazel