Re: [pcre-dev] (*THEN) works differently in Perl

Top Page
Delete this message
Author: ND
Date:  
To: Pcre-dev
Subject: Re: [pcre-dev] (*THEN) works differently in Perl
On 2019-07-09 13:53, ph10 wrote:
> On Mon, 8 Jul 2019, ND via Pcre-dev wrote:
>> And if we disregards Perl's bugs then it seems (*COMMIT) in Perl works
> in a
> > following manner:
> >> 1. Backtracking can't move to the left of COMMIT (this is PCRE
> behaviour too)
> > 2. If COMMIT occurs then no advance match to any other position of
> subject can
> > happen. No matter there are any other backtracking control verbs
> occurs after
> > COMMIT or COMMIT occurs in atomic group/negative lookaround etc (this
> is not
> > implemented by PCRE)
>There is also a difference in the way Perl handles repeated groups.
> Consider
>In Perl, the group repeat matches "abcd", but when it then does not
> match "c", it unwinds complete repetitions of the group. In PCRE2,
> there is a backtrack onto *COMMIT, so it fails. Looks like Perl handles
> *COMMIT somehow differently to normal backtracks, because it does do
> ordinary backtracks into repeated groups:
>


No. I think Perl don't handle (*COMMIT) somehow differently. Perl can
match pattern A*B by number of methods. Common method is named
CURLYX-WHILEM. But there are some optimizations that are involved in some
situations. Thus, optimized method named CURLYM used when A is a group of
constant length without captures. CURLYM have a buggy realization that is
not take into account a (*COMMIT) influence.

Perl match a patterns
/\A(?:.(*COMMIT))*c/
/\A(?:(*COMMIT).)*c/
with use of CURLYM. So it do it wrong in both cases that we can see at
Perl debug output. But in second case result is accidentally coincided to
expected.

A pattern
/\A(?:.{1,2}(*COMMIT))*c/
is matched with CURLYX-WHILEM which realization have not such bug.


I think Perl developers should fix a realization of CURLYM or process
groups that have (*COMMIT) with CURLYX-WHILEM.


What can do PCRE?
PCRE can do nothing or change to process (*COMMIT) as Perl mean it:
1. If COMMIT occurs then backtracking can't move to the pattern part that
is left of it.
2. If COMMIT occurs then start position can't be advanced.
This two principles works no matter there are any other backtracking
control verbs
occurs after COMMIT or COMMIT occurs in atomic group or negative lookaround
etc.

PCRE didn't now realize them strong.
For example consider a pattern:

PCRE2 version 10.33 2019-04-16
/.?(?!(*COMMIT)x)a/
abc
0: a

Perl way is "There can be no backtracking left of COMMIT". So engine can't
backtrack to ".?" and Perl result will be "no match".