Re: [pcre-dev] 'Hard' partial matching don't work with some …

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: ND
CC: Pcre-dev@exim.org
Subject: Re: [pcre-dev] 'Hard' partial matching don't work with some assertions
On Sat, 24 Mar 2012, ND wrote:

> Please inform, is there any progress in correcting PCRE behaviour when
> lookaheads are at the end of subject string, 'partial hard' rised and no
> symbols are inspected before?


Nothing has changed.

> >PCRE version 8.21 2011-12-12
> >/(?<=a)(?!b)/+
> >\P\Pa
> >Partial match: a
> >Now we swap assertions:
> >PCRE version 8.21 2011-12-12
> >/(?!b)(?<=a)/+
> >\P\Pa
> > 0:
> > 0+
> >Result changes to 'match'. Expected 'partial match'. Is there such assertions
> >swaping can change result?


The behaviour happens because PCRE never returns a partial match with an
empty string, because it can *always* find a partial match with an empty
string at the end of the subject.

In the first case, because it has looked behind for "a", it can return a
partial match. In the second case, it runs out of characters before it
has inspected anything, and so it cannot give a partial match.

If you are doing multi-segment matching, I guess you have to assume that
"match" for an empty string at the end of the subject is really a
partial match, and use the PCRE_INFO_MAXLOOKBEHIND feature (which will
be in 8.31 - it is already in the SVN repository) to determine how many
characters to save from the previous segment.

We want to let 8.30 settle down for a while so we can fix the problems
that are inevitable from the big changes before releasing 8.31, which is
likely to come out sometime in May perhaps.

Philip

--
Philip Hazel