Re: [pcre-dev] Capture not reset inside recursion

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: Pcre-dev@exim.org, ND
Subject: Re: [pcre-dev] Capture not reset inside recursion
I did more investigation:

Perl:
/(?:(?:(a)b)?\1)+/ matches abaa
/(?:(?:(ab))?\1)+/ does not match ababab

These pattern / input pairs match in PCRE2. I am pretty sure (?:(P))? is rewritten to ((?:P)?) in Perl, which is valid in some cases, but not in all cases. ND I think you have found a pretty nice Perl bug, maybe you could report it to them.

Regards,
Zoltan

-------- Eredeti levél --------
Feladó: Zoltán Herczeg < hzmester@??? (Link -> mailto:hzmester@freemail.hu) >
Dátum: 2021 június 6 07:21:30
Tárgy: Re: [pcre-dev] Capture not reset inside recursion
Címzett: Pcre-dev@??? < nadenj@??? (Link -> mailto:nadenj@mail.ru) >
The title is misleading, that feature is a JavaScript thing:
/(?:(a)b|\1)+/ matches aba in Perl, but not in JavaScript.
Anyway it looks like the problem here is ()? clears the capturing bracket in Perl when the empty case is selected while restores its previous value in PCRE2.
Matching /(?:(a)??b)+/ to abb also has this difference: the capturing bracket is empty in Perl, while set to a in PCRE2.
Even more interesting that /(?:(?:(a))??\1)+/ only matches to aa as well, while the body of the ?? should not be matched in the second iteration.
Let's do some debugging:
Match /(?:(?{ print "<$1>" })(?:(a))??(?{ print "[$1]" })\1)+/ to aaa
Output:
<>[][a]<a>[][a]
It the second iteration, the capturing bracket contains a before the ?? is executed, and reset to nothing after.
You will not belive this, but /(?:(?:(?{ print "!" })(a))?\1)+/ matches to aaa similar to PCRE2. The code block should have zero effect on the matching, still it disables something (probably an optimization) and works as expected.
Is this a perl bug?
Regards,
Zoltan
 
-------- Eredeti levél --------
Feladó: ND via Pcre-dev < pcre-dev@??? (Link -> mailto:pcre-dev@exim.org) >
Dátum: 2021 június 6 00:44:08
Tárgy: [pcre-dev] Capture not reset inside recursion
Címzett: Pcre-dev@??? (Link -> mailto:Pcre-dev@exim.org)
Here is pcretest listing:
PCRE2 version 10.35 2020-05-09
/(?:(a)?\1)+/
aaa
0: aaa
Expected result:
0: aa
Perl result:
0: aa
--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev