Re: [pcre-dev] several messages

Top Page
Delete this message
Author: ph10
Date:  
To: ND
CC: Pcre-dev
Old-Topics: Re: [pcre-dev] Document SKIP position before or equal start_offset
Subject: Re: [pcre-dev] several messages
On Mon, 17 Jun 2019, ND via Pcre-dev wrote:

> Chapter ISSUES WITH MULTI-SEGMENT MATCHING of pcre2partial.html includes item
> 2 with description how to process with lookbehind assertions.
>
> I think it's important to add to this algorithm a some words about "no match":
> If result of partial match is "no match" then last max_lookbehind characters
> of subject string must be keeped to next matching too. Matching of next
> segment must start with appropriate offset.


Yes, you are right. Thank you. I have done it.

> Sorry I only know a basic english and don't know many nuances of this
> language.


Not a problem. In fact, it is useful to have a non-native English
speaker study the documentation, because that helps to make it clearer
for everybody.

> When "loop is forcibly broken" I assume that there is no more loop. How can we
> return (backtrack) to it?


I hope that my new wording makes it clearer what happens. Whenever an empty
string is matched, matching continues with the next item in the pattern,
but this does not prevent subsequent backtracking into the group. If,
after a backtrack, it matches an empty string again, matching jumps to
the next item as before.

> Second of my little concern is that "X*\z" and "X*" both matches and matches
> are different.
> I understand why it is from procedural point of view. Know that is
> Perl-compatible. But from logical point of view is quite unnaturally.


Not everybody has the same logic. :-) Here is another example where
adding \z changes the outcome:

/.*?/
abc
0:

/.*?\z/
abc
0: abc

I cannot change this, of course.

> Updated docs:
> If (*SKIP) is used inside a lookbehind to specify a new starting point that is
> not later than the starting point of the current match, it is ignored, and the
> normal "bumpalong" occurs.
>
> May be "it is ignored" replace with "this new starting point is ignored".
> Because it can be understand as "SKIP is ignored".


I have tried to make it clearer. Thanks for all your comments.

Philip

--
Philip Hazel