Re: [pcre-dev] (*THEN) works differently in Perl

Top Page
Delete this message
Author: ND
Date:  
To: Pcre-dev
Subject: Re: [pcre-dev] (*THEN) works differently in Perl
On 2019-07-01 10:28, ph10 wrote:
> On Sun, 30 Jun 2019, ND via Pcre-dev wrote:
>> PCRE2 version 10.33 2019-04-16
> > /\A(?:.|..)(*THEN)c/
> > abc
> > No match
> >>> Perl is match "abc".
> > I suppose "next innermost alternative" is interpreted differently by
> PCRE and
> > Perl.
> >> If so, may be PCRE should go Perl way in this matter?
>I think this is a bug in Perl and I will report it as such.



After reading this post
https://rt.perl.org/Public/Bug/Display.html?id=92898#txn-1227153
I don't sure that there is a Perl bug.
I suppose that there are two branches started from "(?:.|..)". Each of
this branches ends with a common TAIL to end of pattern. Here are this two
branches:
1) .(*THEN)c
2) ..(*THEN)c

Lets look at the Perl debug output:


Matching REx "\A(?:.|..)(*THEN)c" against "abcd"
Intuit: trying to determine minimum start position...
   doing 'check' fbm scan, [1..3] gave 2
   Found floating substr "c" at offset 2 (rx_origin now 0)...
   (multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 0
    0 <> <abcd>               |   0| 1:SBOL /\A/(2)
    0 <> <abcd>               |   0| 2:BRANCH(4)
    0 <> <abcd>               |   1|  3:REG_ANY(8)
    1 <a> <bcd>               |   1|  8:CUTGROUP(10)
    1 <a> <bcd>               |   2|   10:EXACT <c>(12)
                              |   2|   failed...
                              |   1|  failed...
    0 <> <abcd>               |   0| 4:BRANCH(7)
    0 <> <abcd>               |   1|  5:REG_ANY(6)
    1 <a> <bcd>               |   1|  6:REG_ANY(8)
    2 <ab> <cd>               |   1|  8:CUTGROUP(10)
    2 <ab> <cd>               |   2|   10:EXACT <c>(12)
    3 <abc> <d>               |   2|   12:END(0)
Match successful!



So backtracking to (*THEN) in BRANCH(4) caused immediately fail of this
branch and jump to BRANCH(7).