[pcre-dev] [Bug 2683] Pcretest on empty string with /g

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2683] Pcretest on empty string with /g
https://bugs.exim.org/show_bug.cgi?id=2683

Philip Hazel <Philip.Hazel@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |ALREADY_FIXED


--- Comment #5 from Philip Hazel <Philip.Hazel@???> ---
PCRE1 (the 8.xx series) is obsolete and there is unlikely to be a new release.
PCRE2 (the 10.xx series) has been out for 6 years now. However, the behaviour
of pcre2test is the same for your first example. I see in the pcre2test code
there is this comment:

"We must now set up for the next iteration of a global search. If we have
matched an empty string, first check to see if we are at the end of the
subject. If so, the loop is over."

In other words this is a deliberate action in pcre(2)test. After all, if an
empty string has been matched at the end of the subject, there cannot be a
non-empty match at the same point.

With regard to your second test - yes, pcre2test is different to pcretest, and
offhand I'm not sure why. However, pcre2test gives exactly the same result as
Perl:

$ ./pcre2test zz
PCRE2 version 10.36 2020-12-04
/(?<=(\G.{2}))(?!$)/g
dfgdftrbrtdtr
0:
1: df
0:
1: gd
0:
1: ft
0:
1: rb
0:
1: rt
0:
1: dt
$

$ perltest.sh zz
Perl v5.32.0

/(?<=(\G.{2}))(?!$)/g
dfgdftrbrtdtr
0:
1: df
0:
1: gd
0:
1: ft
0:
1: rb
0:
1: rt
0:
1: dt
$

Ah! I have found the change that was made for 10.32. This is from the
ChangeLog:

21. In both pcre2test and pcre2_substitute(), with global matching, a pattern
that matched an empty string, but never at the starting match offset, was not
handled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such
a pattern. Because \G is in a lookbehind assertion, there has to be a        
"bumpalong" before there can be a match. The automatic "advance by one  
character after an empty string match" rule is therefore inappropriate. A more
complicated algorithm has now been implemented.


--
You are receiving this mail because:
You are on the CC list for the bug.