Author: ph10 Date: To: ND CC: Pcre-dev Subject: Re: [pcre-dev] Partial match at end of subject
On Sat, 13 Jul 2019, ND via Pcre-dev wrote:
> At its core \z is positive lookahead assertion that want to inspect next
> character of subject.
I must admit I had not thought of it like that. I considered it just to
be "are we at the end of the subject?".
> I propose following algorithm (for PARTIAL_HARD only disregarding the existence
> of PARTIAL_SOFT):
>
> . Are we at the end of the subject? If no, backtrack
> . Is partial hard matching allowed? If no, continue matching
> . Have we inspected any characters? If yes, return a partial match Else
> return "no match"
I have been experimenting with trying this out. It "fixes" your first
example:
/\z/
abc\=ph
No match
Your third example is not a partial matching situation:
/c*/aftertext
ab\=ph
0:
0+ ab
This has found a complete match right at the start of the subject. It
has not hit the end of the subject. However,
/c*/aftertext
ab\=ph,offset=2
No match
Whereas before this would have given a complete match.
Your second example still gives a full match.
/(?!\C)/aftertext
ab\=ph
0:
0+
The reason is that the testing happens inside the assertion, so "no
match" means "assertion is true".
I am still not entirely convinced this change should be made. Zoltán,
what do you think? It would involve making changes to JIT, of course.