Re: [pcre-dev] match point reset bug?

Top Page
Delete this message
Author: Sheri
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] match point reset bug?
Philip Hazel wrote:
> On Sat, 22 Aug 2009, Sheri wrote:
>
>
>> I didn't have it either. Just installed v5.10.0
>>
>> The following prints abc-def-ghi
>>
>> #!/usr/bin/perl
>>
>> $d = "abcdefghi"    ;
>> $d =~ s/abc\K|def\K/-/g   ;

>>
>> print $d, "\n";
>>
>
> I have just committed a patch that deals with this issue. It is a
> typical case of feature addition having unintended consequences. The
> problem, in PCRE, is not a bug in the use of \K; it is concerned with
> matching empty strings. Perl has special rules for what happens with /g
> after an empty string match. It can't behave as normal, of course,
> because it would just loop for ever. PCRE itself has no /g feature, but
> pcretest tries to emulate Perl. What it did was to re-run the pattern at
> the matching point, with PCRE_ANCHORED and PCRE_NOTEMPTY set. This made
> it look for a non-empty match at the point where an empty match had
> previously succeeded. If this failed, it moved on by one character.
>
> Now, until the advent of \K, the only point in a subject string at which
> an anchored pattern could match an empty string was at the start, so
> this worked fine, and did exactly what Perl did. The arrival of \K
> changes this: your pattern matches an empty string not at the start, and
> this is still true even if it is anchored. Such a match should *not* be
> ignored when processing /g, but PCRE_NOTEMPTY ignores all empty matches.
>
> In order to make this work, I have had to invent a new PCRE option
> called PCRE_NOTEMPTY_ATSTART, which locks out an empty match only at the
> start of the subject string. An empty match further into the string is
> accepted. If the pattern is anchored, the two options differ in their
> behaviour only if \K is used. (An alternative would have been to modify
> the behaviour of PCRE_NOTEMPTY, but that would have been an incompatible
> change, and these always cause grief to someone.)
>
> Now pcretest uses the new option, and behaves like Perl. This issue has
> pushed me into installing Perl 5.10 for testing purposes (in parallel
> with Perl 5.8 so I can try both) which is probably a Good Thing. I'm
> going to re-arrange some of the tests so that proper check of all of
> them against 5.10 can be done.
>
> Philip
>
>

Interesting, thanks for the detailed explanation. Seems odd however that
a lookbehind version works in 7.9?

re> /(?<=abc)|(?<=def)/g
data> abcdefghi

0:
0:
data>

I understand why you are making another option, but it sounds like as a
result all user apps that do multiple matching (and the C++ module) will
need to be modified to benefit. In fact if using a shared library, it
will need to be processed one way if using a version less than 8.0 and
another if using 7.9 and earlier.

Have you considered giving the new option value to the old functionality?

Regards,
Sheri