Author: Graycode Date: To: pcre-dev Subject: Re: [pcre-dev] Calculated match recursion stack size
On Fri, 20 Jan 2012, Philip Hazel wrote:
> I have committed an even simpler implementation, but I am unhappy about
> the figures it's giving me on this Linux box[*], so I have made it a bit
> more obscure. You have to run "pcretest -m -C" to make it show the
> value. And the output now uses the word "approximate".
This version also seems fine. With my normal MSVC build it shows:
PCRE version 8.30-Trunk-JAN20 2012-01-20
Compiled with
8-bit support only
UTF-8 support
No Unicode properties support
No just-in-time compiler support
Newline sequence is LF
\R matches all Unicode newlines
Internal link size = 2
POSIX malloc threshold = 30
Default match limit = 50000
Default recursion depth limit = 2500
Match recursion uses stack: approximate frame size = 340 bytes
When compiled with Disabled optimization for debug (MSVC /Od) it shows:
Match recursion uses stack: approximate frame size = 936 bytes
(As before, the POSIX threshold, match limit, and recursion depth are
non-standard values that I overrode in my config.h)
Adding #define SUPPORT_UCP in config.h to a normal build yields:
PCRE version 8.30-Trunk-JAN20 2012-01-20
Compiled with
8-bit support only
UTF-8 support
Unicode properties support
No just-in-time compiler support
Newline sequence is LF
\R matches all Unicode newlines
Internal link size = 2
POSIX malloc threshold = 30
Default match limit = 50000
Default recursion depth limit = 2500
Match recursion uses stack: approximate frame size = 360 bytes
Having SUPPORT_UCP and Disabled optimization for debug yields:
Match recursion uses stack: approximate frame size = 1184 bytes
> [*] Adding more than a certain number of printf statements increased the
> apparent frame size; being simple-minded, I don't really understand why.
> I guess the ways of gcc are mysterious.
I can only describe MSVC compilers. If you were 'simple-minded' then
I'd be the blooming idiot Manuel from Fawlty Towers. Hopefully the
issues you've encountered are caused by compiling with a debug mode.
Anyways some PCRE users may encounter these issues and be without a
clue about what recursion is or how to code for a resolution. Your
putting something in place that can provide even an approximation of
the real-world stack usage might at least provide a clue. It's a
matter of using the pre-existing PCRE match_limit_recursion feature,
setting that to fit within the particular application's available
stack size. This new stack calculation may provide a reasonably
accurate estimate of where to set the match_limit_recursion such that
there is a meaningful PCRE error code instead of dying out with a
stack fault.
Granted though that taking a stack fault and then re-writing a RegEx
until it doesn't fault (or throwing more stack at the problem) is a
lot easier when that's an option.
I think very few real-world complex expressions would ever trigger a
stack fault. Even then it might not be critical to many PCRE users,
they may choose not to add any code calculations to bullet-proof
against it. But to me and maybe a few others what you're addressing
here is very important.
If you get to a point where you feel having the stack calculation
introduces complexity or otherwise becomes un-maintainable, then I'm
OK with going back to hacking something into new PCRE releases for my
own uses. PCRE's continued ability to find the right stuff quickly
and reliably is more important to everybody.