Author: Philip Hazel Date: To: Nuno Lopes CC: pcre-dev Subject: Re: [pcre-dev] need help with a particular regex
On Thu, 13 Sep 2007, Nuno Lopes wrote:
> uhm, my idea was like: match until you can advance to the next state
Except for the fact that there is no concept of "states" in Perl-style
regex matching, that is what is notated by the atomic group concept.
(?> ... something ... )
means "match within the () in the normal way, possibly backtracking,
etc, but once you pass the closing ), that's it: no going back. That is
also what (*PRUNE) means: when *PRUNE is encountered, all
previously-remembered backtracking positions are forgotten.
I like to think of this kind of matching as like a depth-first search of
a tree of possibilities. If you use (?>...) atomic groups, they wrap up
little bits of the tree so that the first way that is found through that
bit is chosen, and cannot be changed, but you can jump right back over
to a previous bit of the regex. What *PRUNE does is to "fix" the current
path through the tree up to the current point. You can't go back and try
any other paths.
> this case the constant string). e.g.:
> ((*SOMEOPT .+))\d+
> run on 'aa1234' would give \1 = 'aa'
That's what (.*?)\d+ would give if this is a standalone regex. But if
it's part of something longer, and what followed failed, it could also
set \1 to aa1, aa12, or aa123, while trying all the possibilities.
But if you use (?>(.*?)\d+) then it is forced to stick with aa (which it
finds first). This should be the same as (.*?)\d+(*PRUNE) in fact.
Philip
--
Philip Hazel, University of Cambridge Computing Service.