Re: [pcre-dev] *+ vs * and depth of recursion

Page principale
Supprimer ce message
Auteur: Philip Hazel
Date:  
À: Ralf Junker
CC: pcre-dev
Sujet: Re: [pcre-dev] *+ vs * and depth of recursion
On Tue, 28 Oct 2008, Ralf Junker wrote:

> a user of my applicatio has pointed out that the following pattern may
> fail due to excessive use of recursion on Windows, where the default
> stack space is much smaller than on Unix:
>
> ^("(?>""|\\"|[^"])*+",){7}"http
>
> This pattern, however, uses far less stack while it appears to do the
> same thing:
>
> ^("(?>""|\\"|[^"])*",){7}"http


Are you sure you have those the right way round? I would expect the
pattern with the possessive quantifier to use less stack rather than
more.

> The user argued that both intents are consistent and that PCRE should
> treat the *+ in the 1st pattern as just * like in the 2nd pattern and
> thus limit its depth of recursion and use of stack space.
>
> Would this be something feasable?


Seems unlikely for anything so general. PCRE does already treat
sequences like \d+X as \d++X because it is simple to check that X cannot
match \d. However, for something in parens that can be arbitrarily
complicated, this kind of optimization gets far too hard. (This comment
assumes the regexes above *are* the wrong way round.)

If the regexes are not the wrong way round above, please send some data
strings that they are matched against, so I can check out how much stack
they are using.

Philip

--
Philip Hazel