Re: [pcre-dev] Return last bumpalong offset in partial_hard …

Top Page
Delete this message
Author: ph10
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] Return last bumpalong offset in partial_hard matching
On Sat, 16 Feb 2013, ND wrote:

> Good day, Philip!
>
> Some days ago I send a request, but it is still no your answer. Do it please.


I saw your message, and it is still in one of my folders, waiting for me
to find some time to think about it. I am now getting back to doing some
work on PCRE, so I will think about it soon. Here are some first
thoughts.

Just to make sure I understand what you are suggesting: instead of
returning the earliest character that was inspected, you want it to
return the starting point of the last match attempt. Is that right? I
presume you then expect to use that offset minus the max lookbehind to
discover what characters to keep. Is that right?

Example: Suppose this is the data for a pattern with max lookbehind = 1,
and the startoffset is 2:

xxxxxxxxxxxxxxxxx
    ^
    ^
    There was a partial match, starting here (offset 4).


Previously, the returned offset would be 1 if it did a lookbehind at the
start. You are suggesting that it should instead be 4, is that right?
Then 4 - 1 = 3 so you need to keep from offset 3, whereas using just the
offset you would have had to keep from offset 1.

I can see that if the bumpalong point is a long way into the string, the
different could be quite bit.

One problem with making this change is that it is incompatible, and
there may be people who actually want the previous value, for whatever
reason. I am always very wary about making incompatible changes; however
straightforward they seem, somebody always seems to get caught out.

Perhaps the bumpalong should be returned in ovector[2] rather than
ovector[0].

Philip

--
Philip Hazel