Re: [pcre-dev] Recursion with possessive quantifiers

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: ND
CC: Pcre-dev
Old-Topics: [pcre-dev] Recursion with possessive quantifiers
Subject: Re: [pcre-dev] Recursion with possessive quantifiers
On Sat, 12 Feb 2011, ND wrote:

> Hi, Philip!
>
> When a possessive quantifier applyed to expression for example
> (?:<EXPR>)*+
> then recursion occurs and its value is equal to the number of sequential
> <EXPR> in input string.
> Why recurtion is so big? It seems there is no need to keep all
> intermediate backtrack points because PCRE have no possibility to return
> to them. Perhaps PCRE need to keep only one last point for backtracking to
> it if following peace of input string don't match <EXPR>.
>
> May be this is a way for optimization?


I've looked at this; unfortunately, I can't see how to optimize. A
pattern such as (abc)*+ is compiled as if it was (?>(abc)*) and then two
issues are raised:

(1) The code does not know that (abc)* is the only thing inside the (?>
parentheses; for example, it must cope with (?>(abc)*xyz) so it must be
able to backtrack within the (?> parentheses.

(2) Even if there is nothing else, it needs to back track. Consider the
pattern (?>(abc)*) and the subject string "abcabcabx". It has to
remember the start point, then try (abc); when that succeeds it remembers
that point and tries (abc) again; when that succeeds it remembers the
point and tries a third time. This time (abc) fails to match, so it has
to back up to the remembered point and then continue out of the (?>
parentheses. I suppose that theoretically it should be possible to
remember only the last back-up point, but because the saving is done by
recursive calls, I can't see how to do that.

Philip

--
Philip Hazel