Re: [pcre-dev] A tweak for the NO_RECURSE option

Kezdőlap
Üzenet törlése
Szerző: Graycode
Dátum:  
Címzett: pcre-dev
Tárgy: Re: [pcre-dev] A tweak for the NO_RECURSE option
On Mon, 12 Dec 2011, Philip Hazel wrote:

> On Mon, 12 Dec 2011, Graycode wrote:
>
> > When PCRE is compiled with NO_RECURSE then it uses allocated heap
> > storage instead of recursion with the stack.
> >
> > The majority of expressions that I use do not require recursion and
> > perhaps neither does a large portion of the PCRE testdata suite.
>
> I think you might have misunderstood this. The terminology is rather
> badly chosen. "Recursion" here does not refer to recursion in the
> patterns, but to the fact that the match() function calls itself
> recursively while performing the match. This uses memory on the system
> stack, and that can run out in environments that have a limited system
> stack.


I do not think there is a misunderstanding but I will try to explain
my position for the sake of clarity. Normally the NO_RECURSE option
is not used, so match() calls itself recursively and thereby uses more
and more memory on the system stack as the level deepens.

With the NO_RECURSE option, the match() function simulates recursion
using the defined C macros RMATCH() and RRETURN(). Here the match()
function does not call itself, so it does not use true stack-based
recursion. The stack does not grow for each (simulated) recursion.
Instead a "frame" to hold the necessary variables is allocated and
later freed from the heap each time. Variables that were put on the
stack, such as 'caseless' and 'condcode', remain as a single instance
on the stack until the function returns.

What I referred to as expressions that "do not require recursion"
means those where the match() function does not invoke the RMATCH()
macro. That covers both true stack-based recursion when NO_RECURSE
is not used, and also the simulated recursion using heap allocation
when it is used. I hope that clarifies it.

Currently the initial "frame" of variables is allocated from the heap
each time match() is invoked and NO_RECURSE is being used. It's the
first thing that the match() function does. What I am suggesting here
is that the initial first-time allocation can be avoided by having a
single instance (total of one) of that initial frame be placed on the
stack.


> Almost every pattern except the very simplest will use one or
> more recursive calls. For example, the pattern /(ab)(cd)(ef)/ recurses 4
> times (you can find this out by running pcretest with the -M option).


Here I am treading into areas where my lack of knowledge about PCRE
may highlight some confusion. My having added other debugging code
indicates that the maximum depth for /(ab)(cd)(ef)/ was 3. There was
the initial frame allocation each time match() was invoked, and then
internal variable 'rdepth' never went beyond 2 more than the initital
frame.

I've been using a source HTML page of www.yahoo.com for my test data.
The content was 178,485 bytes at the time I saved it. Using the
example of /(ab)(cd)(ef)/ indicates that the match() function was
invoked 7,151 times for a single call to pcre_exec(). During those
the match() function allocated a heap frame 7,375 times not counting
the initial frame. So the total number of allocated heap frames was
14,526.

Using the PCRE doc/pcre.3 file as test data for the same pattern
/(ab)(cd)(ef)/ the pcre_exec() invoked match() 291 times and during
those it allocated another 297 heap frames (sum total of 588 heap
allocations) with the same maximum depth of 3.

Of the 5,150 expressions contained in the PCRE testdata files that
I normally run, 1,959 (roughly 38%) needed no further "recursive"
process within the match() function beyond the single initial frame.
The match() function never invoked RMATCH() to simulate a recursion
for any of those 1,959 samples. Hence my statement that a "large
portion of the PCRE testdata suite" would not require recursion.

Hopefully in a few days I'll find the time to re-verify the added
debugging code that I put in place to count the things that are
happening internally.

For my basis testing I've been using the 4 regular expressions that
were used by http://lh3lh3.users.sourceforge.net/reb.shtml
to compare the performance of various engines. The change I proposed
was not detrimental to any of those, and it did noticeably help one of
them. It increased the efficiency of the 3rd (Date) expression by
over 35 percent. This is based on elapsed time measured over running
it in series many times on the yahoo HTML file. During the testing
the data file was loaded into memory to avoid variances such as those
that could be caused by I/O operations.


> The NO_RECURSE feature is designed specifically not to put data on the
> system stack. Instead of recursively calling match() it saves what it
> needs on a heap stack, and jumps to the start of the function.


Yes, I fully comprehend that.


>
> > --- pcre_exec.c  (version 8.21-RC1)
> > +++ pcre_exec.c  (working copy)
> > @@ -328,7 +328,8 @@
> >    {\
> >    heapframe *oldframe = frame;\
> >    frame = oldframe->Xprevframe;\
> > -  (pcre_stack_free)(oldframe);\
> > +  if (oldframe != &frame_zero)\
> > +    (pcre_stack_free)(oldframe);\
> >    if (frame != NULL)\
> >      {\
> >      rrc = ra;\
> > @@ -485,8 +486,9 @@
> >  heap whenever RMATCH() does a "recursion". See the macro definitions above. */

> >
> > #ifdef NO_RECURSE
> > -heapframe *frame = (heapframe *)(pcre_stack_malloc)(sizeof(heapframe));
> > -if (frame == NULL) RRETURN(PCRE_ERROR_NOMEMORY);
> > +heapframe frame_zero;
> ^^^^^^^^^^^^^^^^^^^^^^^^
> ^^^^^^^^^^^^^^^^^^^^^^^^
> ^^^^^^^^^^^^^^^^^^^^^^^^
> This statement is creating a frame *on the system stack*, which is
> precisely the thing NO_RECURSE is trying to avoid doing.
>
> Disclaimer: I'm writing this in a hurry. Apologies if I have
> misunderstood you, but if all you are doing is saving one malloc for a
> frame, I don't think it will apply to many regexes, if any.


I hope you get a chance to review this again in the future. I am
suggesting that ONE and a maximum of only one frame be placed on the
stack. This will add a one-time total of roughly 276 bytes of stack
usage. Regardless of how many times match() and its RMATCH() is used
during execution of an expression, there will never be more than a
single instance of 'frame_zero' residing on the stack.

This is a simple tweak that seems to provide some better speed when
the NO_RECURSE option is used. That option is probably not used very
often because this isn't enough yet to make it comparable to the speed
of stack-based recursion.


Regards,
Graycode