[pcre-dev] What is the expected behavior of /(?=..(*MARK:a))…

Top Page
Delete this message
Author: Thanh Hong Dai
Date:  
To: pcre-dev
Subject: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g
Hi,



When testing the behavior of (*SKIP) to understand its underlying
implementation, I constructed the following regex to verify my
understanding:



/(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g



Test input:



aaaaaaaaaaaaaaabbbbbbbbbbbaa



With the assumption that (*SKIP) fails the attempt at the current starting
position and bump along to the position where (*SKIP) is at, I anticipated
two cases:



1)      (*MARK:a) somehow stores the position, so when (*SKIP) fails the
current attempt, it bumps along 2 characters ahead.


2)      (*MARK:a) is not backtracked into, so when (*SKIP) fails the current
attempt and bumps along by 1 character as per normal.




In either case, I expect there can only be at most one match at the end,
since it's the only place the look-ahead fails.



However, as it turns out, all characters are matched. Running the debugger
on regex101 (https://regex101.com/r/dA9tI1/1) reveals that it tries the
first branch twice, and manages to try the second branch and succeeds.



What is the expected behavior here?