Re: [pcre-dev] A pattern-matching detail: opinions sought

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Herczeg Zoltán, Thorsten Schöning
CC: pcre-dev
Subject: Re: [pcre-dev] A pattern-matching detail: opinions sought
On Fri, 23 Sep 2011, Herczeg Zoltán wrote:

> I have to admit I don't exactly understand how the backtracking words
> are working, but my understanding is that (a) should be done by
> (*PRUNE) according to http://perldoc.perl.org/perlre.html . Therefore
> I would vote to (b) option.


On Fri, 23 Sep 2011, Thorsten Schöning wrote:

> >    Consider the pattern ^A(B(*THEN)C), where A, B, and C are
> >    complex patterns. If matching fails in C, do you expect that    
> >    (a) the entire match should fail, or
> >    (b) the matching should backtrack into A?  

>
> Regarding to perlre I would expect (b), too, because the difference
> between THEN and RPUNE seems to be the possibility to backtrack vs.
> aborting the match, regarding of context and if there is something to
> backtrack. In your example there is.


I'm glad that at least two people think the same way as I do. In fact,
what happens in Perl is (a). ("Note that if this operator is used and
NOT inside of an alternation then it acts exactly like the (*PRUNE)
operator.") However, if the pattern is changed to ^A(B(*THEN)C|(*FAIL))
the answer is (b). In other words, in the original pattern, because
there isn't a second alternative inside the parentheses, Perl does not
consider itself to be inside an alternation.

PCRE implements (b) because I interpret (A) as a group with just one
alternative, not as some different kind of group.

Fortunately, I don't think this matters except when (*THEN) is used, so
in most cases Perl and PCRE do the same thing. The code of PCRE is such
that I think it would be extremely difficult to make it do (a) instead
of (b). The problem is that when it is compiling (*THEN) it does not
know if there is going to be another branch later on. It would have to
be patched up by scanning the compiled pattern after it has been
compiled.

There are already a number of differences in the ways PCRE and Perl
operate. Unless I suddenly think of something clever, I intend to
document the difference in pcrecompat man page, and point out that PCRE
treats (A) as if it were (A|(FAIL)).

Philip

--
Philip Hazel