------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1032
Summary: With MULTILINE option, cannot correctly handle searching
for $ over CRLF
Product: PCRE
Version: 8.00
Platform: x86-64
OS/Version: Linux
Status: NEW
Severity: bug
Priority: low
Component: Code
AssignedTo: ph10@???
ReportedBy: exim@???
CC: pcre-dev@???
The search character $ is an anchor; it doesn't consume any characters.
Therefore the search pattern "$" will only ever return zero-length matches.
This report affects this search pattern, plus others more complex which also
return zero-length matches.
For a MULTILINE search of a string containing a single embedded CRLF and the
search pattern "(*ANY)$" you end up with matches on both the CR and LF, even
though CRLF is a single entity and there is only one line-end. The reason for
this is as follows:
When encountering a non zero-length match, the procedure is to resume searching
from immediately after the match. For a zero length match the procedure is to
retry for a non-zero-length match, or advance to the next character if there is
none.
When you hit the zero-length match at the CR, you end up resuming the search at
the LF, which is also a valid line-ending character, so you get another,
erroneous, match there.
Theoretically it is possible for the user of pcre_exec to spot the CR LF at the
zero-length match position and itself advance two characters - but this
requires that the it knows that CRLF is a single entity, and that would require
that it parse the start of the search pattern for sequences such as (*ANY).
A possible resolution would be for pcre_exec to report the zero-length match on
the CR _and_ somehow indicate that searching should resume two characters
further on. At present ovector[0] and ovector[1] return the position of the
start and end of the match within the string (when there's a match) and -1, -1
when there is none. When there is none, perhaps ovector[1] should indicate the
resume position.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email