Re: [pcre-dev] Clearing documentation about infinite loops

Top Page
Delete this message
Author: ph10
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] Clearing documentation about infinite loops
On Sun, 16 Jun 2019, ND via Pcre-dev wrote:

> PCRE2 version 10.33 2019-04-16
> /(?:a|(?=b)|.)*\z/
> abc
> 0: abc
>
> May be docs need some clarification about what happened at that point.
> After lookahead assertion (?=b) matches, loop is not broken. It seems a
> backtracking occurs as if group was false.


The loop *is* broken after (?=b) - see below.

> And at first glance previous example is not well going together with this:
>
> PCRE2 version 10.33 2019-04-16
> /(?:a|(?=b)|.)*/
> abc
> 0: a


Both of these examples are compatible with Perl.

In the second example, the group matches "a" the first time round the loop.
The second time round the loop it matches (?=b), which is an empty
string, so the loop is broken, and there is nothing more to match.

In the first example, the same thing happens, but after (?=b) is
matched, \z fails so there is a backtrack into the last iteration of the
loop. The next thing to try is ".", which of course matches "b". The
loop then runs a third time, this time with "." matching "c". Then there
are no more characters so the next iteration fails and the matcher goes
on to match \z.

You can see all this by making use of the "auto-callout" feature:

$ ./pcre2test -ac zz
PCRE2 version 10.34-RC1 2019-04-22
/(?:a|(?=b)|.)*\z/
    abc
--->abc
 +0 ^       (?:
 +3 ^       a
 +4 ^^      |
 +3 ^^      a
 +5 ^^      (?=
 +8 ^^      b
 +9 ^ ^     )
+10 ^^      |
+14 ^^      \z
+11 ^^      .
+12 ^ ^     )*
 +3 ^ ^     a
 +5 ^ ^     (?=
 +8 ^ ^     b
+11 ^ ^     .
+12 ^  ^    )*
 +3 ^  ^    a
 +5 ^  ^    (?=
 +8 ^  ^    b
+11 ^  ^    .
+14 ^  ^    \z
+16 ^  ^    End of pattern
 0: abc


/(?:a|(?=b)|.)*/
    abc 
--->abc
 +0 ^       (?:
 +3 ^       a
 +4 ^^      |
 +3 ^^      a
 +5 ^^      (?=
 +8 ^^      b
 +9 ^ ^     )
+10 ^^      |
+14 ^^      End of pattern
 0: a



Philip

--
Philip Hazel