[pcre-dev] [Bug 1490] Awkward semantics of anchored patterns…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1490] Awkward semantics of anchored patterns with lookbehind assertions
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1490




--- Comment #1 from Philip Hazel <ph10@???> 2014-06-06 10:25:32 ---
On Thu, 5 Jun 2014, Markus Mottl wrote:

> In order to implement "split" in a PERL-compatible way, it is necessary to
> check for null patterns as the function advances over the subject string. My
> OCaml-bindings for PCRE use the "ANCHORED" and "NOTEMPTY" flags for that
> purpose. But this apparently causes problems when lookbehind assertions are
> used in the user's pattern, because they are still allowed to match before the
> current position, even when anchored.
>
> It is in principle possible to work around this issue. I could change the API
> of my OCaml-bindings such that users can pass two offsets to all
> PCRE-functions: one for the beginning of the subject string, one for the match
> position. The C-stubs of the bindings would then have to add the new offset to
> the string pointer before calling PCRE and, if non-zero, readjust the match
> indexes afterwards.


I don't understand your last sentence because pcre_exec() already has a
startoffset argument that does exactly this. Its use is demonstrated in
the pcredemo.c file that is part of the PCRE distribution.

> Besides being somewhat involved, this API-change would likely be very
> confusing to users, because they would now have to reason about two string
> offsets.
>
> I don't think the current ANCHORED-semantics should be changed, because some
> people may depend on it. But it would be great if there were another flag that
> would prevent lookbehind assertions from matching before the start of the
> match, essentially treating this position as the true start of the subject
> string.


I'm confused. If you want "this position" to be the true start of the
string, then surely you can just pass it as the subject string to
pcre_exec()? What am I missing here?

Regards,
Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email