Re: [pcre-dev] match point reset bug?

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Sheri
CC: pcre-dev
Subject: Re: [pcre-dev] match point reset bug?
On Sat, 22 Aug 2009, Sheri wrote:

> I didn't have it either. Just installed v5.10.0
>
> The following prints abc-def-ghi
>
> #!/usr/bin/perl
>
> $d = "abcdefghi"    ;
> $d =~ s/abc\K|def\K/-/g   ;

>
> print $d, "\n";


I have just committed a patch that deals with this issue. It is a
typical case of feature addition having unintended consequences. The
problem, in PCRE, is not a bug in the use of \K; it is concerned with
matching empty strings. Perl has special rules for what happens with /g
after an empty string match. It can't behave as normal, of course,
because it would just loop for ever. PCRE itself has no /g feature, but
pcretest tries to emulate Perl. What it did was to re-run the pattern at
the matching point, with PCRE_ANCHORED and PCRE_NOTEMPTY set. This made
it look for a non-empty match at the point where an empty match had
previously succeeded. If this failed, it moved on by one character.

Now, until the advent of \K, the only point in a subject string at which
an anchored pattern could match an empty string was at the start, so
this worked fine, and did exactly what Perl did. The arrival of \K
changes this: your pattern matches an empty string not at the start, and
this is still true even if it is anchored. Such a match should *not* be
ignored when processing /g, but PCRE_NOTEMPTY ignores all empty matches.

In order to make this work, I have had to invent a new PCRE option
called PCRE_NOTEMPTY_ATSTART, which locks out an empty match only at the
start of the subject string. An empty match further into the string is
accepted. If the pattern is anchored, the two options differ in their
behaviour only if \K is used. (An alternative would have been to modify
the behaviour of PCRE_NOTEMPTY, but that would have been an incompatible
change, and these always cause grief to someone.)

Now pcretest uses the new option, and behaves like Perl. This issue has
pushed me into installing Perl 5.10 for testing purposes (in parallel
with Perl 5.8 so I can try both) which is probably a Good Thing. I'm
going to re-arrange some of the tests so that proper check of all of
them against 5.10 can be done.

Philip

--
Philip Hazel