[pcre-dev] [Bug 1472] New: Regexp "(|ab)" not handled in acc…

Top Page
Delete this message
Author: Jonathan S Shapiro
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1472] New: Regexp "(|ab)" not handled in accordance with documentation
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1472
           Summary: Regexp "(|ab)" not handled in accordance with
                    documentation
           Product: PCRE
           Version: N/A
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: bug
          Priority: low
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: shap@???
                CC: pcre-dev@???



I suspect that this is a documentation issue rather than an implementation or
algorithm issue, but it was surprising and it seems worth clarifying.

Given that '|' in PCRE specifies ordered choice, one would expect that the
regular expression "(|ab)" would match anything at all, because there is a
zero-length path to a match. It does not. Running

echo "ab" | pcregrep --color '(|ab)'

indicates that the matched string is "ab", which is an unexpected outcome.
Conversely:

echo "xab" | pcregrep --color 'x(|ab)'

indicates that the matched string is 'x' (as expected).

I suspect (without strong confidence) that pcregrep is rejecting matches of
zero length. I note that

echo "ab" | pcregrep --color '(^|ab)'

also purports to match "ab", which seems to support the notion that zero-length
matches are rejected.

While this behavior is arguably sensible from the human perspective, it does
not conform to the specification of PCRE regular expressions. Either the
specification or the implementation should be corrected to bring them into
agreement.

Please note that I have not dug deeper into this. In particular, it seems
vaguely possible that the PCRE library is returning a result consistent with
the specification, which is in turn being filtered by pcregrep on the basis of
length.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email