Author: Philip Hazel Date: To: ND CC: Pcre-dev Subject: Re: [pcre-dev] several messages
On Sun, 22 Jan 2012, ND wrote:
> This problem is a part of more common problem: If pattern contains lookbehind
> than it may have useful zero-length partial matches.
Yes. It is the lookbehind that causes all problem. But PCRE would have
to remember that the pattern contains lookbehinds - which it currently
does not.
> To illustrate we can modified your example into: string 'abcd' matched
> partially against 'xyz(?<=abcdxyz)'. Now it is obviously then partial match
> becomes needed. What length it must have?
Sorry. I do not understand this comment. At the end of 'abcd' it will
give a zero-length partial match (if we allow it) because there is no
character to match against 'x'. Where does the length of the lookbehind
come into it? The lookbehind has not been reached yet.
> It seems that this length must be equal to maximum lookbehind length
> of the pattern. But I assumed calculating this length is expensive
> task for PCRE.
Actually, since PCRE must calculate the length of every lookbehind
in a pattern, keeping the maximum would not be very expensive.
> Thus I suggest that if pattern contains lookbehind then:
> - pcre_exec can return zero-length partial hard matches
I guess this is less dangerous than always allowing it.
> - PCRE must say (may be at compile time) to main application that pattern
> contains lookbehind
If PCRE remembers lookbehinds, then a PCRE_INFO option could do that. I
presume you want this so that the application can expect zero-length
partial matches.