On Sat, 10 Aug 2019, ND via Pcre-dev wrote:
> I would appreciate if at first we reach a consensus on these suggestions
> before make any rewrite of partial matching that you gonna do.
I do not intend to make any more changes to the partial matching code. I
may do further updates to the documentation.
> As a lookahead is an independent pattern, then we can calculate it's own
> maxlookbehind and add it to outer lookbehind own maxlookbehind. We don't care
> much about that "total" lookbehind be exactly minimum optimal value. We really
> must care that it not less then this value.
If you don't care about optimal values, why bother with lookbehind at
all? Just retain the entire first segment after a partial match.
> > >/(?<=\A.)/info,allusedtext
> > >Capture group count = 0
> > >Max lookbehind = 1
> >
> >This is correct, because the lookbehind does indeed just move back by
> >one character.
> >
>
> Not correct. We already talked about \A is in it's core equal (?<!.)
> regardless of how it really written in PCRE code.
> It is a lookbehind assertion with 1 length.
Unfortunately, we are talking about different things here. The length of
(?<=\A.) is 1 because it matches 1 character. When it is processed, the
current point is moved back by 1. There is then a check that the current
point is at the start, but no previous character is inspected.
> Consider an example. Let's subject "abc" come by two segments "ab" and "c".
> First we try to match segment "ab":
>
> /(?<=\Ab)c/
> ab\=ph
> No match
>
> After this we keep maxlookbehind=1
Why? It hasn't given a partial match. After no match you should throw
away everything.
> last symbols ("b") and concatenates they
> with second segment:
>
> /(?<=\Ab)c/
> bc\=offset=1
> 0: bc
I do not see this, either with 10.33 or the current code. I see this:
PCRE2 version 10.33 2019-04-16 re> /(?<=\Ab)c/
data> bc\=offset=1 0: c data>
which is correct, of course, for an independent match.
> It's a wrong result as it's obvious that whole subject must nomatch.
Yes, I see what you are saying. I think this shows that my attempt to
suggest using max_lookbehind for partial matching does not work for a
number of cases. In other words: max_lookbehind is not what you want it
to be. That means it cannot be used for partial matching.
I think the answer is to recommend that the entire previous segment is
kept after a partial match.
It is also the case that \A (and probably \G as well) are confusing and
useless in partial matching.