Re: [pcre-dev] Partial match at end of subject

Top Page
Delete this message
Author: ph10
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] Partial match at end of subject
On Wed, 24 Jul 2019, ND via Pcre-dev wrote:

> In terms of multisegment matching this may be say: partial hard match occurs
> when current segment is not last and it's content not enough to exactly
> determine, what match (or nomatch) would have WHOLE subject from this start
> position.


Yes, more or less.

I have already decided that I need to rewrite pcre2partial because the
way things work has changed a lot since it was first written.

If I understand you correctly, your proposal would mean that every
non-anchored pattern would give a partial, empty-string, hard partial
match at the end of a non-matching segment, and never return "no match".
I do not like this idea. This is how I see it:

1. A return of "match" means "the pattern has matched in this segment."

2. A return of "no match" means "this segment definitely cannot be part
of a match".

3. A return of "partial match" means "adding another segment may result
in a match starting in this segment" where "starting" means the point
from where characters are inspected.

The tricky case is when the starting point is at the end of the
segment and the pattern might match an empty string, because an empty
string can be matched either at the end of the current segment or at the
start of the next segment (which are, of course, the same place in the
overall string). In this situation, we do not know whether the empty
match will happen or whether adding more characters will produce a
non-empty match. So in this very special case, "partial match" means
"there is going to be a match at this point, but until some more
characters are added, we do not know if it will be an empty string or
something longer".

This is the /c*/ case, and all patterns that can match an empty string
either have no character matches (trivial example: //) or use
quantifiers with zero minima. I suspect this type of pattern is actually
very rare in practice, especially in multi-segment matching scenarios.

Philip

--
Philip Hazel