Re: [pcre-dev] *+ vs * and depth of recursion

Page principale
Supprimer ce message
Auteur: Ralf Junker
Date:  
À: pcre-dev
Sujet: Re: [pcre-dev] *+ vs * and depth of recursion
Philip Hazel wrote:

>> a user of my applicatio has pointed out that the following pattern may
>> fail due to excessive use of recursion on Windows, where the default
>> stack space is much smaller than on Unix:
>>
>> ^("(?>""|\\"|[^"])*+",){7}"http
>>
>> This pattern, however, uses far less stack while it appears to do the
>> same thing:
>>
>> ^("(?>""|\\"|[^"])*",){7}"http
>
>Are you sure you have those the right way round? I would expect the
>pattern with the possessive quantifier to use less stack rather than
>more.


No. Surprisingly, the 2nd pattern uses less stack than the 1st one. The 1st one actually crashes the application due to a stack overflow.

>> The user argued that both intents are consistent and that PCRE should
>> treat the *+ in the 1st pattern as just * like in the 2nd pattern and
>> thus limit its depth of recursion and use of stack space.
>>
>> Would this be something feasable?
>
>Seems unlikely for anything so general. PCRE does already treat
>sequences like \d+X as \d++X because it is simple to check that X cannot
>match \d. However, for something in parens that can be arbitrarily
>complicated, this kind of optimization gets far too hard. (This comment
>assumes the regexes above *are* the wrong way round.)
>
>If the regexes are not the wrong way round above, please send some data
>strings that they are matched against, so I can check out how much stack
>they are using.


Many thanks! I will send the file to you off list to avoid potential copyright violations.

Btw: The patterns are studied and the following compile options are set:

PCRE_CASELESS
PCRE_DOTALL
PCRE_MULTILINE
PCRE_UNGREEDY
PCRE_NEWLINE_CRLF

Ralf