------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1472
Summary: Regexp "(|ab)" not handled in accordance with
documentation
Product: PCRE
Version: N/A
Platform: Other
OS/Version: All
Status: NEW
Severity: bug
Priority: low
Component: Code
AssignedTo: ph10@???
ReportedBy: shap@???
CC: pcre-dev@???
I suspect that this is a documentation issue rather than an implementation or
algorithm issue, but it was surprising and it seems worth clarifying.
Given that '|' in PCRE specifies ordered choice, one would expect that the
regular expression "(|ab)" would match anything at all, because there is a
zero-length path to a match. It does not. Running
echo "ab" | pcregrep --color '(|ab)'
indicates that the matched string is "ab", which is an unexpected outcome.
Conversely:
echo "xab" | pcregrep --color 'x(|ab)'
indicates that the matched string is 'x' (as expected).
I suspect (without strong confidence) that pcregrep is rejecting matches of
zero length. I note that
echo "ab" | pcregrep --color '(^|ab)'
also purports to match "ab", which seems to support the notion that zero-length
matches are rejected.
While this behavior is arguably sensible from the human perspective, it does
not conform to the specification of PCRE regular expressions. Either the
specification or the implementation should be corrected to bring them into
agreement.
Please note that I have not dug deeper into this. In particular, it seems
vaguely possible that the PCRE library is returning a result consistent with
the specification, which is in turn being filtered by pcregrep on the basis of
length.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email