I have just committed seriously refactored code for pcre2_match() to the
SVN repository. I have not yet updated the build system or the
documentation, which I will do over the next few weeks. There won't be a
new release for several months, but in the meantime it would be nice if
anybody can run tests on the new code to try to shake it down as much as
possible. (It runs all the current tests, of course, at least on my
box.)
The JIT code is not yet updated to track the interpreter changes (see
below) but Zoltán will be doing that in due course. As well as a lot of
code tidies, the main changes are as follows:
1. Backtracking is no longer implemented by recursive function calls,
and therefore does not use the system stack. The --disable-stack-for-
recursion build option is obsolete (I will make it give a warning). Once
this is released, the regular reports of "stack exceeded" bugs should go
away. Yay! Backtracking is implemented by using vector of fixed size
"frames" (size depends on the number of captures in a pattern). An
initial 10K vector (enough for ~50 frames) is allocated on the stack,
but if this is too small, heap memory is used.
2. The "match limit" and "match limit recursion" features still work.
The first limits the number of backtracking points that are ever
established, which is effectively a limit on computing resource. The
second limits the depth of nested backtracking, which is effectively a
limit on the amount of heap memory that is used. I may change the name
of "match limit recursion" to something more suitable - perhaps "match
limit depth" though of course the old name will be a synonym.
3. The new implementation now allows backtracking into (possibly
recursive) subroutine calls within the pattern, which is how Perl acts.
It would be easy to add a new option to force these calls to be atomic,
but I would like to be sure that such an option is wanted/needed before
adding it. An individual pattern can always use, for example, (?>(?1))
instead of just (?1) if atomic behaviour is wanted.
4. When a callout is called, a pointer to the ovector is made available.
Formerly, this was the ovector supplied by the caller in a match_data
block. Now it is an internal private vector.
I ran some timing tests on the testdata/testinput1 file and on my Linux
box the new interpreter seems to run a bit faster than the old one.