Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK…

Top Page
Delete this message
Author: Thanh Hong Dai
Date:  
To: 'Zoltán Herczeg'
CC: pcre-dev
Subject: Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g
Hi,

> I think these questions are better suited to https://www.reddit.com/r/regex

I want the answer to be from the perspective of the implementation. Looking at the link, it looks similar to StackOverflow with all the "write me a regex" question, instead of "how is this token implemented".

> anyway, I think the /g causes the regex to match all characters. Without that it probably just matches only one character, as you can see in regex101.

I am well aware of the effect of `g`. I deliberately use g flag to show all the matches. I want to test whether my regex it has the same behavior as /..(*SKIP)(*FAIL)|./g

> The first half of your second assumption is correct, the (*MARK:a) has no effect since it is inside an assertion. However (*SKIP:a) has no effect if the 'a' is not found, so it is not a (*SKIP) in that case!


Does it mean that (*SKIP:label) looks for the (*MARK:label) in the regex execution stack to figure out where to bump along to?
------------------------
Anyway, could anyone please explain the execution trace on regex101: Why is the first branch tried twice, then it proceeds to the second branch?

Best regards,
Thanh Hong.

-----Original Message-----
From: Zoltán Herczeg [mailto:hzmester@freemail.hu]
Sent: Friday, 18 March, 2016 8:11 PM
To: Thanh Hong Dai <hdthanh@???>
Cc: pcre-dev@???
Subject: Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

Hi Thanh,

I think these questions are better suited to https://www.reddit.com/r/regex

anyway, I think the /g causes the regex to match all characters. Without that it probably just matches only one character, as you can see in regex101. The first half of your second assumption is correct, the (*MARK:a) has no effect since it is inside an assertion. However (*SKIP:a) has no effect if the 'a' is not found, so it is not a (*SKIP) in that case!

Hence your pattern is the same as /(?=..(*MARK:a))(*FAIL)|./ since 'a' is not found during backtracking.

Because the first alternative is failed, we matches the second, and that matches to the single character.

Regards,
Zoltan

Thanh Hong Dai <hdthanh@???> írta:
>Hi,
>
>
>
>When testing the behavior of (*SKIP) to understand its underlying
>implementation, I constructed the following regex to verify my
>understanding:
>
>
>
>/(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g
>
>
>
>Test input:
>
>
>
>aaaaaaaaaaaaaaabbbbbbbbbbbaa
>
>
>
>With the assumption that (*SKIP) fails the attempt at the current
>starting position and bump along to the position where (*SKIP) is at, I
>anticipated two cases:
>
>
>
>1)      (*MARK:a) somehow stores the position, so when (*SKIP) fails the
>current attempt, it bumps along 2 characters ahead.

>
>2)      (*MARK:a) is not backtracked into, so when (*SKIP) fails the current
>attempt and bumps along by 1 character as per normal.

>
>
>
>In either case, I expect there can only be at most one match at the
>end, since it's the only place the look-ahead fails.
>
>
>
>However, as it turns out, all characters are matched. Running the
>debugger on regex101 (https://regex101.com/r/dA9tI1/1) reveals that it
>tries the first branch twice, and manages to try the second branch and succeeds.
>
>
>
>What is the expected behavior here?
>
>--
>## List details at https://lists.exim.org/mailman/listinfo/pcre-dev