Auteur: Philip Hazel Date: À: Ralf Junker CC: pcre-dev Sujet: Re: [pcre-dev] *+ vs * and depth of recursion
On Tue, 28 Oct 2008, Ralf Junker wrote:
> a user of my applicatio has pointed out that the following pattern may
> fail due to excessive use of recursion on Windows, where the default
> stack space is much smaller than on Unix:
>
> ^("(?>""|\\"|[^"])*+",){7}"http
>
> This pattern, however, uses far less stack while it appears to do the
> same thing:
>
> ^("(?>""|\\"|[^"])*",){7}"http
Are you sure you have those the right way round? I would expect the
pattern with the possessive quantifier to use less stack rather than
more.
> The user argued that both intents are consistent and that PCRE should
> treat the *+ in the 1st pattern as just * like in the 2nd pattern and
> thus limit its depth of recursion and use of stack space.
>
> Would this be something feasable?
Seems unlikely for anything so general. PCRE does already treat
sequences like \d+X as \d++X because it is simple to check that X cannot
match \d. However, for something in parens that can be arbitrarily
complicated, this kind of optimization gets far too hard. (This comment
assumes the regexes above *are* the wrong way round.)
If the regexes are not the wrong way round above, please send some data
strings that they are matched against, so I can check out how much stack
they are using.