Re: [pcre-dev] several messages

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] several messages
On Mon, 23 Jan 2012, ND wrote:

> No. If PCRE can not calculate the length of the longest lookbehind in pattern
> then main application must know that string returned for a partial match may
> be not long enough and may be more symbols needed to keep.
>
> If PCRE can calculate the length of the longest lookbehind in pattern then it
> can simply returns it. Value 0 means that no lookbehinds present in pattern.


PCRE can easily calculate the length of the longest lookbehind. That is
not a problem. It can return it to the application via a PCRE_INFO_xxx
call. I think a negative value should mean no lookbehinds, because a
lookbehind of length 0 is permitted.

I think we have not yet got this fully understood.

If zero-length partial matches are allowed whenever there is a
lookbehind, then just adding a lookbehind in another branch of the
pattern will change its behaviour. You can always add (?<!)| at the
start of a pattern without having any effect ... the lookbehind always
fails, so matching just carries on with the rest of the pattern.

Something like /abc/ matched against \P\Pxyz give no match. I do not
think it would be useful to make /(?<!)|abc/ give instead a partial
match, just because there is a lookbehind somewhere in the pattern.

Another idea: the problems arise from lookaheads at the end of the
subject string when no previous characters have been inspected. Perhaps
a zero-length partial match should be allowed if it arises within a
lookahead? This means that (?!b) could cause such a match, but [^b]
would not.

I need to work through a lot more examples to see what might make sense
here.

Philip

--
Philip Hazel