[pcre-dev] Whipping up on the NO_RECURSE option

Top Page
Delete this message
Author: Graycode
Date:  
To: pcre-dev
Subject: [pcre-dev] Whipping up on the NO_RECURSE option
This discussion deals with the NO_RECURSE compile-time PCRE option.
It is a follow-up on a previous modification titled "A tweak for the
NO_RECURSE option" from last December.

Quoting parts of that earlier discussion:

Normally the NO_RECURSE option is not used, so match() calls itself
recursively and thereby uses more and more memory on the system stack
as the level deepens.

With the NO_RECURSE option, the match() function simulates recursion
using the defined C macros RMATCH() and RRETURN(). Here the match()
function does not call itself, so it does not use true stack-based
recursion.

For my basis testing I've been using the 4 regular expressions that
were used by http://lh3lh3.users.sourceforge.net/reb.shtml
to compare the performance of various engines. The change I proposed
was not detrimental to any of those, and it did noticeably help one of
them. It increased the efficiency of the 3rd (Date) expression by
over 35 percent. This is based on elapsed time measured over running
it in series many times on the yahoo HTML file. During the testing
the data file was loaded into memory to avoid variances such as those
that could be caused by I/O operations.

------

PCRE benefitted from that "tweak" because it avoided the need for a
frame allocation every single time the match() function was invoked.
This is limited to expressions that require recursive processing, there
is no impact on expressions that do not require recursion to execute.
Now here is more to the story.

For a single exec() the match() function may be called repeatedly, and
with each call it may allocate and release heap frames many times when
executing an expression that requires recursive processing. There can
be a huge number recursions being done. On my Windows system the
elapsed clock time for the frame memory allocations and release was
exceeding all of the other work that PCRE was doing.

What was needed is for the pcre_exec() process to be able to retain
all previous frame allocations done by the match() function. Basically
whatever maximum allocations are done for simulating the recursion need
to be retained throughout the execute process. They would be released
only at the end of that process.

Attached is (hopefully) a GIF image showing a summary of test results.
It shows millisecond measurements of using the same 4 expressions
on the same 178K html data. The unmodified NO_RECURSE took (average)
over 300% more time, or basically about 4 times as long on that test
data. Applying the modification to retain recursive frame allocations
throughout the life of a call to pcre_exec() has reduced the process
to be only about 7% longer. Said differently, the modification reduced
the NO_RECURSE clock time by almost 75%. The 7% probably can't be
eliminated since it probably represents the match() function accessing
its variables through a frame pointer vs. them being on the stack.

It's a significant improvement. This modification brings the
NO_RECURSE option to be very competitive with stack-based recursion,
and of course it still means the avoidance of run-time stack faults.
The impact will vary depending upon how much recursion is needed for
executing a particular expression. Again, there is no impact on
expressions that do not require recursive processing.

I'm sending a patch to Philip instead of posting it here in order to
enable him to review and adjust whatever he wants.

How it was done:

1) I defined a new item in CONFIG.H to specify whether to retain the
previous frame allocation behavior or to use this new method. Perhaps
Philip will choose not to have an option of selecting each method.

2) A way was need to communicate the frame basis between pcre_exec()
and match() functions. While a new parameter could have been used,
I chose instead to add a (void *) pointer to the definition of the
"match_data" structure declared in pcre_internal.h.

3) In pcre_exec.c, a forward chain pointer was added to the "heapframe"
structure. This was key as it allows the match() function to re-use
the next frame if it's previously already been down to that recursion
depth at any time for the execution of that particular expression.

4) The RMATCH() and RRETURN() macros were modified to support this
modification.

5) The pcre_exec() function initializes a base frame and populates
a pointer to it in the "match_data" structure for use by the match()
function.
In 3 places where function pcre_exec() normally returns after having
called match() any number of times, the chain of allocated frames is
released prior to returning.
A new function was created to release the chain of allocated frames.


Regards,
Graycode