Autor: Sheri Data: A: pcre-dev Assumpte: Re: [pcre-dev] match point reset bug?
Philip Hazel wrote: > On Sat, 12 Sep 2009, Sheri wrote:
>
>
>> Interesting, thanks for the detailed explanation. Seems odd however that
>> a lookbehind version works in 7.9?
>>
>> re> /(?<=abc)|(?<=def)/g
>> data> abcdefghi
>> 0:
>> 0:
>> data>
>>
>
> Lookbehind is different! It finds the empty string at the point at which
> it is starting the match. That is, in your example above , the match
> "bumpalong point" (to use Friedl's terminology) is 3. But if you use
>
> /abc\K|def\K/g
>
> the match happens when the bumpalong point is 0. The lookbehind version
> does indeed work in 7.9. Having found the empty match at offset 3, it
> fails to find a non-empty match, and so moves on by one character.
> Matches then fail until it reaches offset 6, at which point it finds the
> def match.
>
> With \K, however, doing the same thing doesn't work. Having found the
> empty match, it looks for a non-empty match starting at offset 3, and
> fails. So it moves to offset 4, and so misses the next match. That's why
> it has to ignore only the empty match at offset 3 itself, not one that
> is found later in the string. Is that clear? I know it's tricky!
>
>
>> I understand why you are making another option, but it sounds like as a
>> result all user apps that do multiple matching (and the C++ module) will
>> need to be modified to benefit. In fact if using a shared library, it
>> will need to be processed one way if using a version less than 8.0 and
>> another if using 7.9 and earlier.
>>
>
> Aarrgghh. I had not thought of that. What I *had* thought of was that
> people might be using PCRE_NOTEMPTY for completely different purposes,
> and I did not want to break their applications.
>
>
>> Have you considered giving the new option value to the old functionality?
>>
>
> Oh dear, oh dear, this looks like I have to make a judgement as to which
> change will annoy the fewest people, remembering that the problem arises
> only if \K is used in a pattern that matches an empty string. Something
> like
>
> /ab\Kc/de\Kf/
>
> works fine in 7.9. I take your point about shared libraries etc. I am
> now in a quandary as to what it the best way to proceed.
>
> Anybody else on this list got any ideas? If I do as Sheri suggests, and
> give PCRE_NOTEMPTY_ATSTART the bit value that was PCRE_NOTEMPTY, and
> give PCRE_NOTEMPTY a new value, pre-compiled applications that use a
> shared library would automatically move to the new functionality, but
> any that actually wanted the PCRE_NOTEMPTY functionality would go wrong.
>
> [At this point, I went away and played Mozart (string quartet) for an
> hour. Helps clear the brain...]
>
> Hmm... having thought about this for a bit, I think I am going to stick
> with things the way they are. This is my reasoning:
>
> * If I change the functionality of the existing options bit, programs
> that use PCRE_EMPTY for completely independent purposes (nothing to do
> with /g-style processing) will suddenly break on a PCRE upgrade.
>
> * On the other hand, programs which at present use PCRE_EMPTY for
> /g-style processing are not working properly in the presence of \K
> patterns that match empty strings, a relatively rare situation (at
> least, that's my guess).
>
> It seems to me to be better to choose the action that allows people to
> improve the behaviour of their programs that are not working quite right
> over an action that breaks programs that are working well.
>
> I take the point about PCRE versions. But if programmers are changing
> their programs anyway, and want to remain compatible with previous
> releases, it isn't too hard to do something like
>
> #if PCRE_MAJOR >= 8
> options |= PCRE_NOTEMPTY_ATSTART
> #else
> options |= PCRE_NOTEMPTY
> #endif
>
> which, in any case, is exactly the same as what you have to do for any
> new option that is added.
>
> Philip
>
> Will you or Craig be implementing the fix in the cpp wrapper?