Re: [pcre-dev] Calculated match recursion stack size

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Graycode
CC: pcre-dev
Old-Topics: [pcre-dev] Calculated match recursion stack size
Subject: Re: [pcre-dev] Calculated match recursion stack size
On Tue, 6 Dec 2011, Graycode wrote:

> This is a feature request to enable PCRE to calculate and tell the net
> impact on the stack for each recursive call in the internal match()
> function used by pcre_exec().
>
> Below is a modified version of what I've been using for some time.
> I tried to make this similar to other PCRE code style, hopefully it
> got close enough to seem familiar. There is still one "//" commented
> line that should have been removed.


I think I understand your patch, and I understand why you want the
information, but I'm afraid I don't like the way the patch works.

1. It uses static variables. These are a no-no because they are not
thread safe.

2. I don't understand the use of #ifdef PCRE_CONFIG_MATCH_RECURSION_STACK
because you have defined that macro, so the #ifdef will always be true.

3. I am not at all sure this should be a PCRE_CONFIG_xxx thing because
you can't change it by a config option.

4. If somebody links statically with pcre_config(), it will always drag
in pcre_exec(), even it they don't use that function in their
application, making their binary a lot larger than it should be.

5. There are places in pcre_exec() that have code like this:

      for (fi = min;; fi++)
        {
        int slength;
        RMATCH(eptr, ecode, offset_top, md, eptrb, RM14);
        if (rrc != MATCH_NOMATCH) RRETURN(rrc);
        if (fi >= max) RRETURN(MATCH_NOMATCH);
        if ((slength = match_ref(offset, eptr, length, md, caseless)) < 0)
          {
          CHECK_PARTIAL();
          RRETURN(MATCH_NOMATCH);
          }
        eptr += slength;
        }
      /* Control never gets here */


I cannot claim to know very much about the way stacks work in modern
compilers, but it seems to me that extra stack is needed for the slength
variable, and so the stack used before the call to RMATCH at that point
will be larger than at other points where there is no such declaration.
If I am right, it means that there is no single size that applies to all
calls. I won't be surprised to learn I am wrong, however.

6. I am not sure why your code allows for two additional ints. Where do
they come from?

After all those negatives, here is something positive: I *think* that
perhaps the way to approach this is to arrange for the "special" way of
calling pcre_exec() to be indicated differently, so that it can be
called directly from pcretest (or any other program) without going via
pcre_config. Unfortunately, there are very few option bits left, so I
would not want to use one of them on this rather rare and specialist
feature. However, something magic like "all option bits set" might be a
possibility. Then, in addition, the caller would have to pass over a
pointer to memory that fulfils the function of your static variable.
This could be cast via the regex argument, I suppose, though that's a
bit messy. The function would yield the answer.

Safety check: if all option bits are set, check that the regex pointer
does NOT point to a plausible regex, and that extra, subject, and offset
pointers are all NULL, and length, start_offset, and offset count are
all zero.

I am not particularly happy about this, but I will do some experiments.

Philip

--
Philip Hazel