Re: [pcre-dev] Re-factored pcre2_match() needs testing

Top Page
Delete this message
Author: William A Rowe Jr
Date:  
To: pcre-dev, httpd
Subject: Re: [pcre-dev] Re-factored pcre2_match() needs testing
This is very interesting and coincidental to our efforts at the Apache httpd
project. A number of weeks ago I migrated trunk to PCRE2 provided it is
detected. Hopefully, most developers are running from trunk/bleed on
most projects, at least that's where I'm at.

We do have complications taking the temp regex groups off of stack, and
the natural place for us to put them, the local (thread-locked) request pool
does not work out so well, since that identifier was never an arg to our
function call. I think I have that resolved with my next response to the team.

Very glad you took the next layer off the stack as well, and understand that
the backrefs are non-trivial. I'm sharing this crosslist to make everyone
aware that extra attention right now would be very useful and appreciated.

Good luck!

Bill


On Thu, Mar 9, 2017 at 11:11 AM, <ph10@???> wrote:
> Folks,
>
> I have just committed seriously refactored code for pcre2_match() to the
> SVN repository. I have not yet updated the build system or the
> documentation, which I will do over the next few weeks. There won't be a
> new release for several months, but in the meantime it would be nice if
> anybody can run tests on the new code to try to shake it down as much as
> possible. (It runs all the current tests, of course, at least on my
> box.)
>
> The JIT code is not yet updated to track the interpreter changes (see
> below) but Zoltán will be doing that in due course. As well as a lot of
> code tidies, the main changes are as follows:
>
> 1. Backtracking is no longer implemented by recursive function calls,
> and therefore does not use the system stack. The --disable-stack-for-
> recursion build option is obsolete (I will make it give a warning). Once
> this is released, the regular reports of "stack exceeded" bugs should go
> away. Yay! Backtracking is implemented by using vector of fixed size
> "frames" (size depends on the number of captures in a pattern). An
> initial 10K vector (enough for ~50 frames) is allocated on the stack,
> but if this is too small, heap memory is used.
>
> 2. The "match limit" and "match limit recursion" features still work.
> The first limits the number of backtracking points that are ever
> established, which is effectively a limit on computing resource. The
> second limits the depth of nested backtracking, which is effectively a
> limit on the amount of heap memory that is used. I may change the name
> of "match limit recursion" to something more suitable - perhaps "match
> limit depth" though of course the old name will be a synonym.
>
> 3. The new implementation now allows backtracking into (possibly
> recursive) subroutine calls within the pattern, which is how Perl acts.
> It would be easy to add a new option to force these calls to be atomic,
> but I would like to be sure that such an option is wanted/needed before
> adding it. An individual pattern can always use, for example, (?>(?1))
> instead of just (?1) if atomic behaviour is wanted.
>
> 4. When a callout is called, a pointer to the ovector is made available.
> Formerly, this was the ovector supplied by the caller in a match_data
> block. Now it is an internal private vector.
>
> I ran some timing tests on the testdata/testinput1 file and on my Linux
> box the new interpreter seems to run a bit faster than the old one.
>
> All feedback is welcome!
>
> Philip
>
> --
> Philip Hazel
> --
> ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev