------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1490
Summary: Awkward semantics of anchored patterns with lookbehind
assertions
Product: PCRE
Version: N/A
Platform: All
URL: https://bitbucket.org/mmottl/pcre-ocaml/issue/3/zero-
width-split-problem
OS/Version: All
Status: NEW
Keywords: work:small
Severity: wishlist
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: markus.mottl@???
CC: pcre-dev@???
The URL associated with this report gives some more background information on
this problem, which makes it hard to use non-C bindings with PCRE.
In order to implement "split" in a PERL-compatible way, it is necessary to
check for null patterns as the function advances over the subject string. My
OCaml-bindings for PCRE use the "ANCHORED" and "NOTEMPTY" flags for that
purpose. But this apparently causes problems when lookbehind assertions are
used in the user's pattern, because they are still allowed to match before the
current position, even when anchored.
It is in principle possible to work around this issue. I could change the API
of my OCaml-bindings such that users can pass two offsets to all
PCRE-functions: one for the beginning of the subject string, one for the match
position. The C-stubs of the bindings would then have to add the new offset to
the string pointer before calling PCRE and, if non-zero, readjust the match
indexes afterwards.
Besides being somewhat involved, this API-change would likely be very
confusing to users, because they would now have to reason about two string
offsets.
I don't think the current ANCHORED-semantics should be changed, because some
people may depend on it. But it would be great if there were another flag that
would prevent lookbehind assertions from matching before the start of the
match, essentially treating this position as the true start of the subject
string.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email