On Sat, 8 Oct 2011, Herczeg Zoltán wrote:
> I played with capturing brackets inside a recursion a found a
> difference between PERL (5.10.1) and PCRE. I thought I may ask PERL
> developers about it because the effect seems strange to me, but the
> first mail I found on their list archives is not exactly inviting:
> http://www.nntp.perl.org/group/perl.beginners/2011/10/msg118878.html
> Perhaps someone here knows the answer.
>
> /(.)(\1|a(?2))/ matches to "bab" in PCRE but not in PERL ().
> /\1|(.)(?R)\1/ matches "cbbbc"in PCRE but not in PERL.
>
> It seems to me that PERL does not see the content of capturing
> brackets inside a recursion. The following test proves this:
>
> /(.)((?(1)c|a)|a(?2))/ matches to "baa" in PERL but not in PCRE.
>
> What do you think, is this a bug or intended behaviour in PERL?
[I can confirm that Perl's behaviour is still the same in the latest
testing version 5.15.3.]
Recursion was implemented in PCRE quite a long time before Perl
implemented it. There is a discussion, entitled "Recursion difference
from Perl" in the pcrepattern documentation, but this is concerned with
the fact the PCRE treats recursive calls as atomic (as does Python),
whereas Perl does not.
It looks like this is another difference. In the perlre man page, when
discussing recursion, it says: "Capture buffers contained by the pattern
will have the value as determined by the outermost recursion." However,
the documentation also says that a recursion call "treats the contents
of a capture buffer as an independent pattern that must match at the
current position". If it really means that about an "independent
pattern", then that does kind of imply that it has its own independent
set of capture buffers.
If nobody else posts here, I guess you'll have to ask the Perl
developers. I have in the past reported bugs (man perlbug) and had
useful responses and discussions (e.g. the recent issue about *THEN),
though it sometimes takes a while before anyone picks up the issue.
Philip
--
Philip Hazel