Re: [pcre-dev] several messages

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: ND, Zoltán Herczeg
CC: Pcre-dev
Old-Topics: Re: [pcre-dev] 'Hard' partial matching don't work with some assertions
Subject: Re: [pcre-dev] several messages
Remember that pcre_exec() returns whichever of a full match or a hard
partial match it finds first.

On Sun, 22 Jan 2012, ND wrote:

> PCRE version 8.21 2011-12-12
> /(?<=a)(?!b)/+
> \P\Pa
> Partial match: a


The lookbehind works; in the lookahead a partial match is forced because
the next character is not available *and* at least one character has
been inspected.

> Now we swap assertions:
>
> PCRE version 8.21 2011-12-12
> /(?!b)(?<=a)/+
> \P\Pa
> 0:
> 0+


In this case a partial match is not forced in the lookahead because no
characters have been inspected. So the lookahead succeeds and the rest
of the pattern matches.

> Another example:
>
> PCRE version 8.21 2011-12-12
> /(?!a)/+
> \P\Pa
> 0:
> 0+


Same thing. A partial match can never be an empty string.

On Sun, 22 Jan 2012, Zoltán Herczeg wrote:

> But you want a complete behaviour change for your specific use case
> which would break compatibility, although they have some reasons.


Indeed.

> As this would be a compatibility breaker new feature, we should
> probably aim for 8.31 The best thing would be to open a bug and
> discuss this new behaviour. And let other people tell their opinion
> which usually takes time...
>
> If I summarize what you said so far. If hard partial matching is enabled:
> - \z (and perhaps \Z and $) must never match at the end of the string
> - Match must not allowed at the end of the subject string


If a new feature is added (or the behaviour of hard partial is changed),
the only possibility would be to return no match.

A thought: we already have the PCRE_NOTEOL flag, but it is documented
like this:

  PCRE_NOTEOL                                                            


This option specifies that the end of the subject string is not the
end of a line, so the dollar metacharacter should not match it nor
(except in multiline mode) a newline immediately before it. Setting
this without PCRE_MULTILINE (at compile time) causes dollar never to
match. This option affects only the behaviour of the dollar
metacharacter. It does not affect \Z or \z.

I wonder why it does not affect \Z or \z? I further wonder if it should
be made to affect \Z and \z when hard partial matching is happening?

Philip

--
Philip Hazel