Re: [pcre-dev] Partial match at end of subject

Top Page
Delete this message
Author: ph10
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] Partial match at end of subject
On Sat, 13 Jul 2019, ND via Pcre-dev wrote:

> PCRE2_PARTIAL_HARD is intended for multisegment matching. I think when this
> option is set it means: this subject IS incomplete, it's only a non-last part
> of a certain "entire" subject.


It was never intended to mean "this subject is incomplete", rather "this
subject MAY BE incomplete". However ...

> If I right understand you, then you assume that next example must be complete
> matched at the end of subject:
>
> PCRE2 version 10.33 2019-04-16
> /(?<=b)\z/
> ab\=ph
> Partial match: b
>               <

>
> But really PCRE output is a partial match. And I think it's correct.


I now understand why this is different from

re> /\z/
data> ab\=ph

0:

The reason for the different behaviour is that an empty string partial
match is never given. It is explicitly locked out. I went and read the
code; that always helps. On reaching \z the code does this:

. Are we at the end of the subject?   If no, backtrack
. Is partial matching allowed?        If no, continue matching
. Have we inspected any characters?   If no, continue matching
. Is it partial hard?                 If yes, return a partial match


"Continue matching" reaches the end of the pattern and so gives a full
match result.

Returning a partial match when no characters have been inspected is
deliberately locked out, because otherwise there would always be an
empty partial match for any unanchored pattern.

The only way this could be changed would be to make some kind of
exception for \z and I do not think that is a good idea.

Philip

--
Philip Hazel