[pcre-dev] [Bug 2182] New: Lookahead behaving as though a ma…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2182] New: Lookahead behaving as though a match succeeded in a null-matching repeated group
https://bugs.exim.org/show_bug.cgi?id=2182

            Bug ID: 2182
           Summary: Lookahead behaving as though a match succeeded in a
                    null-matching repeated group
           Product: PCRE
           Version: 8.41
          Hardware: x86
                OS: Windows
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: tattarakis@???
                CC: pcre-dev@???


This is a bug report and feature request rolled into one :D

Consider the following simple expression matched against "aab":

^(?:(?=(\1?+a))(?=aab).){1,2}

\1 then = "aa", whereas only "a" is expected.

The first round succeeds, then in the second only the first lookahead matches.
You should therefore expect that any environmental changes brought on by
matching the first lookahead (such as backreference setting) to be reset. If
you allow the group to consume a character, you get the expected result:

^(?:(?=(\1?+a))(?=aab).){1,2}

Now \1 = "a".

This brings me to my feature request: the reason we use such constructs is
that, currently, a group quantified with + or * (ie. potentially endlessly)
stops matching as soon as an empty string is matched. This makes sense; the
engine is looking after us and trying to ensure we don't end up continuously
matching empty strings until the end of time. However, is it possible to tweak
this safeguard slightly so that if the state of the environment has changed
since the last round, such as if one or more backreference values have changed,
we can trust that the user knows what they're doing and continue matching? Or
perhaps introduce a more explicit way to invoke this highly desirable
behaviour?

If you let us quantify null-matching groups endlessly, you will open up
possibilities that can only be accomplished generally using variable-length
lookbehinds (which, of course, are not supported). Think of all the people who
so desire VLLBs who would be delighted with this feature :P And I'm guessing it
wouldn't require nearly as much work as actually implementing them.

Thank you for your time.

- John "jaytea" Tattarakis

--
You are receiving this mail because:
You are on the CC list for the bug.