Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK…

Top Page
Delete this message
Author: Thanh Hong Dai
Date:  
To: 'Zoltán Herczeg'
CC: pcre-dev
Subject: Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g
Dear Zoltan,

Thanks for your reply. That is exactly what I want to know.

Best regards,
Thanh Hong.


-----Original Message-----
From: Zoltán Herczeg [mailto:hzmester@freemail.hu]
Sent: Monday, 21 March, 2016 2:54 PM
To: Thanh Hong Dai <hdthanh@???>
Cc: pcre-dev@???
Subject: RE: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

>Does it mean that (*SKIP:label) looks for the (*MARK:label) in the regex execution stack to figure out where to bump along to?


Exactly. It searches the last MARK in the regex stack which name matches and restart the match from there.

E.g.: when /x(*:a)x(*:a)(*SKIP:a)(*FAIL)|./ matches to xxy, the result is y not x. If you delete the second (*:a) the result is x.

One more rule is that no SKIP can go back (to avoid infinite match loops).

>------------------------
>Anyway, could anyone please explain the execution trace on regex101: Why is the first branch tried twice, then it proceeds to the second branch?


I know the interpreter matches it twice because it cannot check the MARK label stack for the name, and the "not found" is caught when the whole stack is reverted. Then, the engine restarts the match, and knows that (*SKIP:label) must be ignored (this is just the brief overview of the concept, the implementation is a bit more complex). PCRE-JIT does not have this disadvantage.

Regards,
Zoltan