Re: [pcre-dev] (*THEN) works differently in Perl

Top Page
Delete this message
Author: ph10
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] (*THEN) works differently in Perl
On Mon, 8 Jul 2019, ND via Pcre-dev wrote:

> And if we disregards Perl's bugs then it seems (*COMMIT) in Perl works in a
> following manner:
>
> 1. Backtracking can't move to the left of COMMIT (this is PCRE behaviour too)
> 2. If COMMIT occurs then no advance match to any other position of subject can
> happen. No matter there are any other backtracking control verbs occurs after
> COMMIT or COMMIT occurs in atomic group/negative lookaround etc (this is not
> implemented by PCRE)


There is also a difference in the way Perl handles repeated groups.
Consider

Perl 5.030000 Regular Expressions
/\A(?:.(*COMMIT))*c/
    abcd
 0: abc


PCRE2 version 10.34-RC1 2019-04-22
/\A(?:.(*COMMIT))*c/
    abcd
No match


In Perl, the group repeat matches "abcd", but when it then does not
match "c", it unwinds complete repetitions of the group. In PCRE2,
there is a backtrack onto *COMMIT, so it fails. Looks like Perl handles
*COMMIT somehow differently to normal backtracks, because it does do
ordinary backtracks into repeated groups:

Perl 5.030000 Regular Expressions
/\A(.{1,2})*X/
    AABBCX
 0: AABBCX
 1: C


Adding {1,2} to the first example gives this:

Perl 5.030000 Regular Expressions
/\A(?:.{1,2}(*COMMIT))*c/
    abcd
No match


Having another backtrack point inside the group changes things, but then
I found this:

Perl 5.030000 Regular Expressions
/\A(?:(*COMMIT).)*c/
    abcd
No match


I give up!

Philip

--
Philip Hazel