https://bugs.exim.org/show_bug.cgi?id=2683
Bug ID: 2683
Summary: Pcretest on empty string with /g
Product: PCRE
Version: 8.44
Hardware: x86
OS: Windows
Status: NEW
Severity: bug
Priority: medium
Component: Code
Assignee: Philip.Hazel@???
Reporter: c_moi_l_master@???
CC: pcre-dev@???
Hello,
Although the handling of /g is left to the programmer in PCRE, there is still a
recommended by PCRE's doc, perl compatible, way to do it.
A relevant bit in pcre.txt:
"Finding all the matches in a subject is tricky when the pattern can match an
empty string. It is possible to emulate Perl's /g behaviour by first trying the
match again at the same offset, with the PCRE_NOTEMPTY_ATSTART and
PCRE_ANCHORED options, and then if that fails, advancing the starting offset
and trying an ordinary match again."
When applying the expression /a?a?/gC against the string "." in pcretest:
-the first match is found in 3 steps (3 callouts), since it's an empty match,
pcretest, and I can only assume it must follow the above for /g, tries another
match at the same position with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED set,
this attempt fails in 3 steps as well. Now the next position is tried, a second
match is found in 3 steps.
So far it took 9 steps to find two matches.
Now the interesting part: according to me and according to the author of
regex101.com, the second match is still an empty match, so you must retry
another match there with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED set, which
pcretest is NOT doing, but regex101.com is doing (can be seen in the debugger,
totaling 12 steps)
Is pcretest not trying a last match attempt there an issue with pcretest or is
this according to perl and not an issue?
If it's not an issue, what cause pcretest to stop too early there? is it
because an empty match has been found at the end of the string and is just a
small optimization?
Another observation is the following about PCRE2 (relevant
https://github.com/firasdib/Regex101/issues/1236 ):
pcre2.txt has the exact same bit of information about how to handle /g, however
pcre2test does not seem to behave this way because the expression
/(?<=(\G.{2}))(?!$)/g when applied to the string "dfgdftrbrtdtr" reports
different captures than pcretest.
I believe this is related to /g and empty match, and that if PCRE2 (or
pcre2test) were actually following the same recommendation from PCRE, they
would produce the same captures.
Am I correct or is the difference in capture not related to that at all (could
still be an issue maybe)?
--
You are receiving this mail because:
You are on the CC list for the bug.