Re: [pcre-dev] Re-factored pcre2_match() needs testing

Top Page
Delete this message
Author: ph10
Date:  
To: William A Rowe Jr, Ralf Junker
CC: pcre-dev, httpd
Subject: Re: [pcre-dev] Re-factored pcre2_match() needs testing
On Mon, 13 Mar 2017, William A Rowe Jr wrote:

> Very glad you took the next layer off the stack as well, and understand that
> the backrefs are non-trivial. I'm sharing this crosslist to make everyone
> aware that extra attention right now would be very useful and appreciated.


Thanks for that comment. When I first created PCRE, twenty years ago,
the regex patterns were *much* simpler and it seemed natural to use
recursive function calls (and therefore the stack). I was writing PCRE
for Exim to use, and made it a separate project "just in case" anybody
else could make use of it.

Since then, PCRE has become much more widely used than I ever expected,
and patterns have got a lot more complicated. As PCRE was hacked to
support new features, some complicated fudges were introduced to keep
the stack usage down, because stack overflow was reported regularly. (I
also learned that many environments have quite small stacks.)

The bug reported in issue 1887 was the final straw. After a couple of
days of messing around, I realized there was no way of fixing it within
the existing paradigm.

The --disable-stack-for-recursion feature had been introduced to use the
heap instead of the stack, but as it uses independent malloc-ed frames,
it ran about 30% slower in many cases.

So ... I bit the bullet and figured out a different way of using the
heap that I hoped would (a) result in cleaner code, with fewer messy
bits and (b) not be any slower. I think I have achieved (b) and at least
some of (a).

On Wed, 15 Mar 2017, Ralf Junker wrote:

> I can confirm that starting with SVN 671, PCRE2 uses less stack.


Thank you.

> Nevertheless, this test case from testinput6 still uses lots of stack (and exceeds it on a Windows machine):


Indeed, but that is using pcre2_dfa_match(), which has *not* been
changed. I don't know how widely used pcre2_dfa_match() is used, but I'm
sure it is far less than the main pcre2_match() function, which is the
one that gives Perl-compatibility. It would be possible to refactor
pcre2_dfa_match() in due course if there is enough demand, but it uses
function recursion much less than pcre2_match() used to, and mainly for
handling recursion within the pattern.

Philip

--
Philip Hazel